Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical operability techniques for teams - webinar - Skelton Thatcher & Unicom

2,004 views

Published on

Presenters: Matthew Skelton and Rob Thatcher, Skelton Thatcher Consulting

Webinar: Operability is all about making software work well in Production. In this webinar, we explore practical, tried-and-tested, real world techniques for improving operability with many kinds of software systems, including cloud, Serverless, on-premise, and IoT: logging with Event IDs, Run Book dialogue sheets, endpoint healthchecks, correlation IDs, and lightweight User Personas.

Target audience: Software Developer, Tester, Software Architect, DevOps Engineer, Delivery Manager, Head of Delivery, Head of IT.

Benefits: Attendees will gain insights into operability and why this is important for modern software systems, along with practical experience of techniques to enhance operability in almost any software system they encounter.

Published in: Software
  • Be the first to comment

Practical operability techniques for teams - webinar - Skelton Thatcher & Unicom

  1. 1. Practical Operability Techniques for Teams Matthew Skelton & Rob Thatcher Skelton Thatcher Consulting skeltonthatcher.com / @SkeltonThatcher Unicom Seminars – webinar – 28 September 2017
  2. 2. Today 40 mins: operability techniques 10-15 mins: questions
  3. 3. Today What is operability? Modern logging Run Book dialogue sheets Endpoint healthchecks Correlation IDs User Personas for dashboards
  4. 4. Training 1-day tutorial with exercises Book via Unicom: http://www.unicom.co.uk/workshops.html
  5. 5. You Software Developer Tester / QA DevOps Engineer Team Leader ?
  6. 6. Operability: use modern logging, Run Book dialogue sheets, endpoint healthchecks, correlation IDs, and user personas as team collaboration techniques
  7. 7. About us Co-founders at Skelton Thatcher Consulting Matthew Skelton Rob Thatcher
  8. 8. Team-first digital transformation 30+ organisations UK, US, EU, India, China
  9. 9. We build modern capabilities by mentoring your teams
  10. 10. Team Guide to Software Operability Matthew Skelton & Rob Thatcher skeltonthatcher.com/publications Download a free sample chapter
  11. 11. Practical Operability Techniques for Teams
  12. 12. What is operability?
  13. 13. Operability making software work well in Production
  14. 14. Logging with Event IDs
  15. 15. logging with Event IDs reduce time to detect problems increase team engagement increase configurability enhance collaboration #operability
  16. 16. search by event Event ID {Delivered, InTransit, Arrived}
  17. 17. transaction trace Correlation ID 612999958…
  18. 18. How many distinct event types (state transitions) in your application?
  19. 19. represent distinct states
  20. 20. enum Human-readable sets: unique values, sparse, immutable C#, Java, Python, node (Ruby, PHP, …)
  21. 21. Technical Domain public enum EventID { // Badly-initialised logging data NotSet = 0, // An unrecognised event has occurred UnexpectedError = 10000, ApplicationStarted = 20000, ApplicationShutdownNoticeReceived = 20001, MessageQueued = 40000, MessagePeeked = 40001, BasketItemAdded = 60001, BasketItemRemoved = 60002, CreditCardDetailsSubmitted = 70001, // ... }
  22. 22. BasketItemAdded = 60001 BasketItemRemoved = 60002
  23. 23. log using Event IDs with a modern ‘structured logging’ library
  24. 24. Example: video processing On-demand processing of TV advertisements Ad-agency  TV broadcaster High throughput Glitch-free video & audio
  25. 25. Storage I/O Worker Job Queue Upload
  26. 26. Example: video processing Discover processing bottlenecks Trigger alerts via LogEntries / HostedGraphite Report on KPIs Target areas for improvement
  27. 27. Run Book dialogue sheets
  28. 28. Run Book dialogue sheets Checklists: typical operational considerations Team-friendly exploration
  29. 29. System characteristics Hours of operation During what hours does the service or system actually need to operate? Can portions or features of the system be unavailable at times if needed? Hours of operation - core features (e.g. 03:00-01:00 GMT+0) Hours of operation - secondary features (e.g. 07:00-23:00 GMT+0) Data and processing flows How and where does data flow through the system? What controls or triggers data flows? (e.g. mobile requests / scheduled batch jobs / inbound IoT sensor data ) …
  30. 30. http://runbooktemplate.info/
  31. 31. runbooktemplate.infoRun Book dialogue sheets
  32. 32. Endpoint healthchecks
  33. 33. endpoint healthchecks Every runnable app/service/daemon exposes /status/health An HTTP GET to the endpoint returns: 200 – "I am healthy" 500 – "I am sick"
  34. 34. endpoint healthchecks For databases and other non-HTTP components, run a lightweight HTTP service in front of the component 200 / 500 responses
  35. 35. https://github.com/Lugribossk/simple-dashboard
  36. 36. Correlation IDs
  37. 37. ‘Unique-ish’ identifier for each request Passed through downstream layers
  38. 38. Unique-ish ID
  39. 39. Synchronous HTTP: X-HEADER e.g. X-trace-id X-trace-id: 348e1cf8 If header is present, pass it on (Yes, RFC6648, but this is internal only)
  40. 40. Asynchonous (queues, etc.): Message Attributes, name:value pair e.g. "trace-id":"348e1cf8" AWS SQS: SendMessage() / ReceiveMessage() Log the Correlation ID if present
  41. 41. Example: electronic trading High speed, low latency Trading options & derivatives Connected to stock exchanges Sub-millisecond timings > £1 million per day traded
  42. 42. Correlations IDs for trading Evidence for timely operation Help identify bottlenecks Target areas for perf tuning Identify race conditions Increase operability
  43. 43. Lightweight user personas
  44. 44. Lightweight user personas: Ops Engineer Test Engineer Build & Deployment Engineer Service Owner
  45. 45. http://www.keepitusable.com/blog/?tag=alan-cooper
  46. 46. https://www.geckoboard.com/blog/visualisation-upgrades- progressing-towards-a-more-useful-and-beautiful-dashboard/
  47. 47. Lightweight user personas: What data does the User Persona need visible on a dashboard in order to make decisions rapidly & safely?
  48. 48. Summary
  49. 49. Operability making software work well in Production
  50. 50. logging with Event IDs use enum-based Event IDs to explore runtime behaviour and fault conditions
  51. 51. Run Book dialogue sheets explore and establish operational requirements as a team, around a physical table, together
  52. 52. endpoint healthchecks HTTP 200 / 500 responses to /status/health call with JSON details – good for tools and humans
  53. 53. Correlation IDs trace execution using correlation IDs: synchronous (HTTP X-trace-id) async (SQS MessageAttribute)
  54. 54. lightweight user personas explore the needs of different roles for rapid decisions via dashboards
  55. 55. Operability use modern logging, Run Book dialogue sheets, endpoint healthchecks, correlation IDs, and user personas as team collaboration techniques
  56. 56. Team Guide to Software Operability Matthew Skelton & Rob Thatcher skeltonthatcher.com/publications Download a free sample chapter
  57. 57. Training 1-day tutorial with exercises Book via Unicom: http://www.unicom.co.uk/workshops.html
  58. 58. Questions? via the webinar chat tool via Twitter: @SkeltonThatcher via email: questions@skeltonthatcher.com Unicom Seminars: info@unicom.co.uk
  59. 59. Resources • Training: Practical Operability for Developers and Testers – led by Matthew Skelton and Rob Thatcher – 1-day workshop – http://www.unicom.co.uk/practical-operability-for-developers- and-testers.html • Team Guide to Software Operability by Matthew Skelton and Rob Thatcher (Skelton Thatcher Publications, 2016) http://operabilitybook.com/ • Run Book template & Run Book dialogue sheets http://runbooktemplate.info/
  60. 60. thank you @SkeltonThatcher skeltonthatcher.com

×