Successfully reported this slideshow.

How to address operational aspects effectively with Agile practices - Matthew Skelton - Agile In The City 2015

3

Share

132 of 157
132 of 157

How to address operational aspects effectively with Agile practices - Matthew Skelton - Agile In The City 2015

3

Share

Download to read offline

Treating operational aspects of software as 'non-functional requirements' and 'an Ops problem' rather than a core part of the software product leads to poor live service and unexplained errors in Production.
Traceability, deployability, recoverability, diagnosability, monitorability, and high quality logging are key features of a software system, along with user-visible features surfaced via the UI, or a capability of an API endpoint.
However, many Product Owners understandably feel uneasy about taking on the (necessary) responsibility for prioritising operational features alongside user-visible and API features.
This session brings Scrum Masters and Product Owners up to speed on operational features and covers proven practices for improving operability in an Agile context, empowering Product Owners to make effective prioritisation choices about all kinds of product features, whether user-visible or operational.

Treating operational aspects of software as 'non-functional requirements' and 'an Ops problem' rather than a core part of the software product leads to poor live service and unexplained errors in Production.
Traceability, deployability, recoverability, diagnosability, monitorability, and high quality logging are key features of a software system, along with user-visible features surfaced via the UI, or a capability of an API endpoint.
However, many Product Owners understandably feel uneasy about taking on the (necessary) responsibility for prioritising operational features alongside user-visible and API features.
This session brings Scrum Masters and Product Owners up to speed on operational features and covers proven practices for improving operability in an Agile context, empowering Product Owners to make effective prioritisation choices about all kinds of product features, whether user-visible or operational.

More Related Content

Similar to How to address operational aspects effectively with Agile practices - Matthew Skelton - Agile In The City 2015

More from Skelton Thatcher Consulting Ltd

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

How to address operational aspects effectively with Agile practices - Matthew Skelton - Agile In The City 2015

  1. 1. How to address operational aspects effectively with Agile practices Agile in the City – 20th November 2015 #agileinthecity Matthew Skelton Skelton Thatcher Consulting @matthewpskelton
  2. 2. “Operational Features” how to develop and test prioritisation techniques collaboration approaches
  3. 3. availability is the best feature
  4. 4. transforming technology and teams Cloud, Agile, DevOps high impact expertise
  5. 5. transaction reporting credit reference FOREX online payments
  6. 6. Operational Features
  7. 7. “the properties of a system which make it work well in Production”
  8. 8. Not PIMP MY RIDE MORE Greasy Mechanic
  9. 9. Not PIMP MY RIDE MORE Greasy Mechanic
  10. 10. Terminology
  11. 11. what happened to NFRs? (non-functional requirements)
  12. 12. Non-Functional Functional
  13. 13. language impact
  14. 14. non-starter non compos mentis non-compete
  15. 15. nonsense !
  16. 16. holistic product view
  17. 17. How did we get to this?
  18. 18. admission: IT folk have been guilty of making operational features quite scary & mysterious
  19. 19. long lists of requirements crazy test plans
  20. 20. poor explanation of needs failure to engage stakeholders gold-plating
  21. 21. de-mystify operational features
  22. 22. better approach pragmatic and effective rapid, safe, valuable
  23. 23. “the properties of a system which make it work well in Production”
  24. 24. Why value Operational Features?
  25. 25. downtime: $$$ reputation ($$)
  26. 26. non-linear increase in complexity and problems
  27. 27. Internet of Things
  28. 28. we can no longer deal manually with the scale/volume of potential problems
  29. 29. agility and response to incidents
  30. 30. remote car hacking: security as an operational feature
  31. 31. “We have ‘cloud’ now” (HA + DR + Backup + Metrics + Diagnostics + …)
  32. 32. think: "when it fails, how will we recover?“ it will fail
  33. 33. How do we develop and test Operational Features?
  34. 34. defined features testable and measurable
  35. 35. ahead lie the ‘ilities’...
  36. 36. 1. What 2. How to test
  37. 37. Operational Hooks
  38. 38. Deployment Pipeline
  39. 39. Configurability
  40. 40. re-read config (SIGHUP) text files in version control inject settings – no ‘black boxes’
  41. 41. toggle features via config “Postcode lookup unavailable”  better UX
  42. 42. Deployability
  43. 43. immutable artefacts concurrent releases (SxS) symlinks
  44. 44. rapid scriptable simple failure modes
  45. 45. Maintainability
  46. 46. holding page as MVP!
  47. 47. live system component diagrams
  48. 48. modularity ability to upgrade version numbering (SemVer?)
  49. 49. BasketItemAdded grep BasketItem
  50. 50. logging for insights
  51. 51. Testability
  52. 52. every component has a /health endpoint
  53. 53. stubbed/mocked/faked endpoints test things individually
  54. 54. Recoverability
  55. 55. asynchronous service start expect services to be erroring logs are not wiped (rotated: okay) avoid flooding logs
  56. 56. no nasty zombies after failures MTTR more important than MTBF* * for most kinds of F
  57. 57. Performance
  58. 58. run key 'hotspot' areas early use a deployment pipeline ‘critical path’
  59. 59. early pipeline tests act as a barometer for later performance problems
  60. 60. derive transit time metrics
  61. 61. Monitorability
  62. 62. stream of metrics transaction tracing
  63. 63. Resilience
  64. 64. assume missing or failing Chaos Monkey don’t crash on HTTP 503
  65. 65. Saboteur + deployment pipeline
  66. 66. Scalability
  67. 67. concurrent workers queues and bottlenecks throttling is your friend
  68. 68. Security and ‘securability’
  69. 69. securability by practice SSL certs & HEARTBLEED
  70. 70. Gauntlt + deployment pipeline
  71. 71. Availability
  72. 72. “available but unusable" synthetic transactions
  73. 73. special HTTP header: trigger additional metrics/reporting
  74. 74. How the organisation affects Operational Features
  75. 75. Budgets
  76. 76. bonuses: story points delivered tickets closed
  77. 77. Capex vs Opex tax breaks
  78. 78. avoiding the Capex/Opex evil
  79. 79. Developers seen as more valuable than Ops people 3x hiring bonus for Devs (!)
  80. 80. improved awareness in product teams
  81. 81. share ownership and decision making
  82. 82. features end-user operational end-user
  83. 83. single product backlog
  84. 84. Product Owner on call for incidents
  85. 85. tricky! high degree of maturity honesty about the product
  86. 86. Product Owner and Tech Lead are both on the hook for outages
  87. 87. 15-30% ‘tax’ on product budget for operational aspects
  88. 88. AVOID ‘user features’ always taking precedence over ‘operational features’
  89. 89. How to evaluate Operational Features vs User Features
  90. 90. treat Ops team folk as another user persona
  91. 91. alternatives to User Stories?
  92. 92. NOT: "as a logging subsystem, I want..."
  93. 93. Metrics
  94. 94. Live: downtime, A/B for operational aspects (speed) Pre-live: time spent re-deploying
  95. 95. Metrics for better conversations
  96. 96. metric-ify your delivery and test infrastructure  99.99% uptime, but 20 redeployments every time
  97. 97. Heuristics for operational features 30% of total product budget 30% of dev team time
  98. 98. Improving operational awareness
  99. 99. Run Book Collaboration
  100. 100. Run Book •Detailed description of how the system operates •Maintenance •Repair •Error recovery
  101. 101. Run Book / Ops Manual • 1 Table of Contents • 2 System Overview • 2.1 Service Overview • 2.2 Contributing Applications, Daemons, and Windows Services • 2.3 Hours of Operation • 2.4 Execution Design • 2.5 Infrastructure and Network Design • 2.6 Resilience, Fault Tolerance and High-Availability • 2.7 Throttling and Partial Shutdown • 2.8 Required Resources • 2.9 Expected Traffic and Load • 2.9.1 Hot or Peak Periods • 2.9.2 Warm Periods • 2.9.3 Cool or Quiet Periods • 2.10 Environmental Differences • 2.11 Tools • 3 Security and Access Control • 4 System Configuration • 4.1 Configuration Management • 5 System Backup and Restore • 5.1 Backup Requirements • 5.1.1 Special Files • 5.2 Backup Procedures • 5.3 Restore Procedures • 6 Monitoring and Alerting • 6.1 Error Messages • 6.2 Events • 6.3 Health Checks • 6.4 Other Messages • 7 Operational Tasks • 7.1 Deployment • 7.2 Batch Processing • 7.3 Power Procedures • 7.4 Routine Checks • 7.4.1 System Rebuilds • 7.5 Troubleshooting • 8 Maintenance Tasks • 8.1 Maintenance Procedures • 8.1.1 Patching • 8.1.1.1 Normal Cycle • 8.1.1.2 Zero-Day Vulnerabilities • 8.1.2 GMT/BST time changes • 8.1.3 Cleardown Activities • 8.1.3.1 Log Rotation • 8.2 Testing • 8.2.1 Technical Testing • 8.2.2 Post-Deployment • 9 Failure and Recovery Procedures • 9.1 Failover • 9.2 Recovery • 9.3 Troubleshooting Failover and Recovery • 10 Contact Details
  102. 102. Run Book / Ops Manual 2.1 Service Overview 2.2 Contributing Applications, Daemons, and Windows Services 2.3 Hours of Operation 2.4 Execution Design 2.5 Infrastructure and Network Design 2.6 Resilience, Fault Tolerance and High- Availability 2.7 Throttling and Partial Shutdown 2.8 Required Resources 2.9 Expected Traffic and Load
  103. 103. Run Book collaboration Dev team is responsible for the first draft “But I know nothing about Production!” Encourages collaboration with Ops team
  104. 104. Will Gray
  105. 105. not documentation build trust and understanding automate more over time http://runbookcollab.info/
  106. 106. choose tools that encourage collaboration
  107. 107. http://rashidkpc.github.io/Kibana/images/screenshots/searchss.png
  108. 108. “How does [the use of] this tool help people to collaborate*?” * Work together, at the same keyboard/screen
  109. 109. ‘How to choose tools for DevOps and Continuous Delivery’ http://bit.ly/ChooseDevOpsTools
  110. 110. test early and often for operational readiness
  111. 111. operational readiness network testing security testing performance testing auxiliary infrastructure testing: monitoring log aggregation …
  112. 112. small set of rapid ‘weathervane’ tests for early warning
  113. 113. Network testing iTrinegy network emulators •Scripted setup and automated test runs •http://www.itrinegy.com/ Saboteur: •Network fault injection tool •https://github.com/tomakehurst/saboteur
  114. 114. Security testing Gauntlt: http://gauntlt.org/ SSL certs HTTP SQL injection … # nmap-simple.attack Feature: simple nmap attack to check for open ports Background: Given "nmap" is installed And the following profile: | name | value | | hostname | example.com | Scenario: Check standard web ports When I launch an "nmap" attack with: """ nmap -F <hostname> """ Then the output should match /80.tcps+open/ Then the output should not match: """ 25/tcps+open """
  115. 115. When I launch an "nmap" attack with: """ nmap -F <hostname> """ Then the output should match /80.tcps+open/
  116. 116. Deployment pipeline Make operational testing visible
  117. 117. holistic product view
  118. 118. MVP: ‘service unavailable’ page
  119. 119. test early for operational features using a deployment pipeline
  120. 120. single product backlog: (user) features + (operational) features
  121. 121. availability is the best feature
  122. 122. further reading operabilitybook.comoperationalfeatures.com
  123. 123. thank you http://skeltonthatcher.com/ enquiries@skeltonthatcher.com @SkeltonThatcher +44 (0)20 8242 4103

×