#unidevops

Software Operability,
Run Book Collaboration,
and DevOps

Matthew Skelton
27th February 2014
DevOps Summit,
London, UK
www.devopssummit.com
@matthewpskelton
softwareoperability.com
• Software Operability
• Run Book Collaboration
• Making Operability Work
• Questions

#unidevops

Agenda
• Software systems since 1998
• Continuous Delivery specialist,
DevOps enthusiast, Operability nut
• London Continuous Delivery meetup
group - londoncd.org.uk
• Experience DevOps workshops
• PIPELINE Conference

#unidevops

Background
#unidevops

Software
Operability
•
•
•
•

Definitions
Examples
Why focus on operability?
How DevOps can help

#unidevops

Software Operability
#unidevops

Operability?
• Cognates:
– Opera
– Operate
– Operational
– Inter-operability

#unidevops

Etymology of Operability?
#unidevops
• Operability: the properties of a
system which make it work well in
Production

#unidevops

Software Operability
Since 1929,
Mallorca, Spain

#unidevops

Operable Systems
• David Copeland (@davetron5000):
“How your software runs in
production is all that matters. The
most amazing abstractions, cleanest
code, or beautiful algorithms are
meaningless if your code doesn’t run
well on production.”
•

http://www.naildrivin5.com/blog/2013/06/16/production-is-all-that-matters.html

#unidevops

Software Operability
•
•
•
•
•
•
•
•
•

Deploy
Monitor
Diagnose
Debug
Query
Control
Inspect
Clear
...

#unidevops

Operational Criteria
#unidevops

“Non-Functional”
• Hooks (internal APIs) for:
– Logging
– Monitoring
– Diagnostics
– Health checks
– Data clear-down
– Service / daemon / container control

#unidevops

Shaped by Operability
#unidevops

Ops Folk are Users Too!
#unidevops
• Deploy more rapidly, frequently
• High cost of Production outage
• Systems now more complicated

#unidevops

Why focus on Operability?
#unidevops

Outages are Embarrassing!
#unidevops

Operational considerations
#unidevops

Operational considerations
#unidevops

Operational considerations
• DevOps is one way to address
poor operability
• Improved collaboration and
communication between Dev
teams and Ops teams
• Example: Run Book Collaboration

#unidevops

How DevOps can help
#unidevops

Run Book
Collaboration
• Feedback loops and learning
• What is a run book?
• How can run book collaboration
help operability?

#unidevops

Run Book Collaboration
Gene Kim:
http://itrevolution.com/the-three-ways-principles-underpinning-devops/

#unidevops

Feedback Loops
#unidevops

Run Book
#unidevops

Templates
#unidevops

Example
•
•

1 Table of Contents
2 System Overview
–
–
–
–
–
–
–
–
–

2.1 Service Overview
2.2 Contributing Applications, Daemons, and
Windows Services
2.3 Hours of Operation
2.4 Execution Design
2.5 Infrastructure and Network Design
2.6 Resilience, Fault Tolerance and HighAvailability
2.7 Throttling and Partial Shutdown
2.8 Required Resources
2.9 Expected Traffic and Load
•
•
•

4.1 Configuration Management

–
–
–

•

7.5 Troubleshooting

–

8.1 Maintenance Procedures

7 Operational Tasks

•

•

5.2 Backup Procedures
5.3 Restore Procedures

–
–

6.1 Error Messages
6.2 Events

6 Monitoring and Alerting

8.1.1 Patching
–
–

•
•

–

•

–

8.1.3.1 Log Rotation

8.2.1 Technical Testing
8.2.2 Post-Deployment

9 Failure and Recovery Procedures
–
–
–

•

8.1.1.1 Normal Cycle
8.1.1.2 Zero-Day Vulnerabilities

8.1.2 GMT/BST time changes
8.1.3 Cleardown Activities

8.2 Testing
•
•

5 System Backup and Restore
5.1.1 Special Files

7.4.1 System Rebuilds

8 Maintenance Tasks
•

5.1 Backup Requirements

3 Security and Access Control
4 System Configuration

•

7.1 Deployment
7.2 Batch Processing
7.3 Power Procedures
7.4 Routine Checks

–

2.10 Environmental Differences
2.11 Tools

–

•

–
–
–
–

•

6.3 Health Checks
6.4 Other Messages

2.9.1 Hot or Peak Periods
2.9.2 Warm Periods
2.9.3 Cool or Quiet Periods

–
–

•
•

–
–

9.1 Failover
9.2 Recovery
9.3 Troubleshooting Failover and Recovery

10 Contact Details
#unidevops

Example
•
•

1 Table of Contents
2 System Overview

– 2.1 Service Overview
– 2.2 Contributing Applications,
Daemons, and Windows
Services
– 2.3 Hours of Operation
– 2.4 Execution Design
– 2.5 Infrastructure and Network
Design
– 2.6 Resilience, Fault Tolerance
and High-Availability
– 2.7 Throttling and Partial
Shutdown
– 2.8 Required Resources
– 2.9 Expected Traffic and Load

•
•
•
•
•
•
•
•

3 Security and Access
Control
4 System Configuration
5 System Backup and
Restore
6 Monitoring and Alerting
7 Operational Tasks
8 Maintenance Tasks
9 Failure and Recovery
Procedures
10 Contact Details
#unidevops

Example
2.1 Service Overview
2.2 Contributing
Applications,
Daemons, and
Windows Services
2.3 Hours of
Operation
2.4 Execution Design
2.5 Infrastructure and
Network Design

2.6 Resilience, Fault
Tolerance and
High-Availability
2.7 Throttling and
Partial Shutdown
2.8 Required
Resources
2.9 Expected Traffic
and Load
#unidevops

It‟s Not Documentation
#unidevops

Focus on Collaboration
•
•
•
•
•

Better understanding
Better cross-team working
Reduction in operational problems
Fewer outages
Reduced long-term cost-ofownership

#unidevops

Outcomes
•
•
•
•

Focus on the collaboration
Run book is a means, not an end
Throw it away when complete (?)
Aim to automate more over time

• See http://runbookcollab.info/

#unidevops

Run Book as Collaboration
#unidevops

Making Operability
Work
•
•
•
•
•

NFRs vs Operational Features
Budget changes
Organisation changes
Responsibility changes
Avoid on-call anti-patterns

#unidevops

Making Operability Work
#unidevops

“Non-Functional”
Features

#unidevops

Operational Features
• Single product backlog
– End-user + Operational features
– New features + bugs

• Product Owner on call
– Accountable for operational failures
– Seriously!

#unidevops

Taking Operability Seriously
#unidevops
• “What is your budget code?”
• Capex vs. Opex?
• Remove budget barriers to
regular, effective communication

#unidevops

Budget changes
#unidevops

Niek Bartholomeus (@niekbartho) - http://niek.bartholomeus.be/

https://speakerdeck.com/niekbartho/self-organization-vs-global-optimization-a-comparison-betweentraditional-and-modern-organizations
• “I‟ll need to ask my manager first”
• Lack of autonomy
• Remove reporting barriers to regular,
effective communication
• More at
http://bit.ly/DevOpsTopologies

#unidevops

Organisation changes
#unidevops

“I just want to write code”
#unidevops

Mysterious Coding Tricks
#unidevops

On-call for Responsibility
•
•
•
•
•
•

Too much overtime pay
Too little overtime pay
Rota team too small
No training in incident response
No team ownership of product
No team autonomy for changes

#unidevops

On-call Anti-Patterns
• Team members want to help
make things better
• Empowered to fix problems
• Reduce the times they are woken
up

#unidevops

On call - Goal
•
•
•
•
•

Operational Features, not “NFRs”
Sustainable collaboration
Sensible, fair on-call rotas
Over-compensate in time off
Avoid burn-out

#unidevops

The operability of operability
#unidevops

Recapitulation
Making software
systems work well
in Production

#unidevops

Software Operability
Shared focus on
operability
throughout the
delivery cycle

#unidevops

Run Book Collaboration
Use DevOps
team patterns for
sustainable
operability

#unidevops

Making Operability Operable
#unidevops

What‟s Next?
• Patterns for
Performance and
Operability
– Ford, Gileadi, Purba,
Moerman

• http://whoownsmyoperability.com/
– Recommended reading lists

#unidevops

Further Reading
• Release It!
– Michael Nygard
(@mnygard)

• http://www.michaelnygard.com/

#unidevops

Further Reading
• Software Operability – How to make
software work well in Production
– Due early late 2014

• Sign up at OperabilityBook.com

• Discount code for DevOps Summit
attendees

#unidevops

Operability Book
• A hands-on workshop for DevOps
culture

• Forthcoming dates:
– London: 28th February 2014

• http://experiencedevops.org/

#unidevops

Experience DevOps
•
•
•
•
•
•

Continuous Delivery
„Unconference‟ format
Tuesday 8th April 2014
London, UK
http://pipelineconf.info/
@PipelineConf

#unidevops

PIPELINE
#unidevops

Matthew Skelton
@matthewpskelton

Questions &
Discussion

softwareoperability.com
operabilitybook.com
bit.ly/DevOpsTopologies
http://www.blinkenlights.nl/images/
blinkenlights-big.jpeg
http://www.danatronics.com/s db_apps.html
http://riverbankoftruth.com/ wpcontent/uploads/2013/07/embarrassedchimp22.jpg
http://www.thinkgeek.com/edm/
20040709.html

Acknowledgements

http://indianaohindiana.com/wpcontent/uploads/2013/10/Tome.jpg
http://www.guavaworks.com/companyblog/guava-doesnt-do-cookie-cutter.html
http://www.carpages.co.uk/ford/ford-sandsculptures-05-09-11.asp
http://www.thisismoney.co.uk/money/experts/
article-2324270/Take-smaller-pension-pots-taxfree-leave-final-salary-untouched.html
http://paranoidnews.org/wpcontent/uploads/2010/10/Alien-Hunt-AlarmClock.jpg
http://particulations.blogspot.co.uk/
2010/08/headingley-hole.html
http://marvel.wikia.com/
Stephen_Strange_(Earth-616)

#unidevops

http://pianofortekeys.files.wordpress.com/
2013/04/ariadnne_wideweb__470x3300.jpg
#unidevops

Further Slides
#unidevops

The Phoenix Project
#unidevops

Continuous Delivery

Software operability and run book collaboration London Feb 2014