#unidevops

Software Operability
and Run Book
Collaboration

Matthew Skelton
14th November 2013
DevOps Summit
Amsterdam
www.devopssumit.com
@matthewpskelton
softwareoperability.com
• Software Operability
• Run Book Collaboration
• Making Operability Work

• Questions

#unidevops

Agenda
• Software systems since 1998
• Build & Deployment at
thetrainline.com
• London Continuous Delivery
meetup group - londoncd.org.uk
• Experience DevOps workshops

#unidevops

Background
#unidevops

Software
Operability
• Operability: the properties of a
system which make it work well in
Production

#unidevops

Software Operability
#unidevops

“Non-Functional”
Since 1929,
Mallorca, Spain

#unidevops

Operable Systems
• David Copeland (@davetron5000):
“How your software runs in
production is all that matters. The
most amazing abstractions, cleanest
code, or beautiful algorithms are
meaningless if your code doesn’t run
well on production.”
•

http://www.naildrivin5.com/blog/2013/06/16/production-is-all-that-matters.html

#unidevops

Software Operability
•
•
•
•
•
•
•
•
•

Deploy
Monitor
Diagnose
Debug
Query
Control
Inspect
Clear
...

#unidevops

Operational Criteria
#unidevops

Ops Folk are Users Too!
#unidevops

Run Book
Collaboration
#unidevops

Operational considerations
#unidevops

Operational considerations
#unidevops

Operational considerations
#unidevops

Run Book
#unidevops

Example
•
•

1 Table of Contents
2 System Overview
–
–
–
–
–
–
–
–
–

2.1 Service Overview
2.2 Contributing Applications, Daemons, and
Windows Services
2.3 Hours of Operation
2.4 Execution Design
2.5 Infrastructure and Network Design
2.6 Resilience, Fault Tolerance and HighAvailability
2.7 Throttling and Partial Shutdown
2.8 Required Resources
2.9 Expected Traffic and Load
•
•
•

4.1 Configuration Management

–

–
–

•

7.5 Troubleshooting

–

8.1 Maintenance Procedures

7 Operational Tasks

•

•

5.2 Backup Procedures
5.3 Restore Procedures

–
–

6.1 Error Messages
6.2 Events

6 Monitoring and Alerting

8.1.1 Patching
–
–

•
•

–

•

–

8.1.3.1 Log Rotation

8.2.1 Technical Testing
8.2.2 Post-Deployment

9 Failure and Recovery Procedures
–
–
–

•

8.1.1.1 Normal Cycle
8.1.1.2 Zero-Day Vulnerabilities

8.1.2 GMT/BST time changes
8.1.3 Cleardown Activities

8.2 Testing
•
•

5 System Backup and Restore
5.1.1 Special Files

7.4.1 System Rebuilds

8 Maintenance Tasks
•

5.1 Backup Requirements

3 Security and Access Control
4 System Configuration

•

7.1 Deployment
7.2 Batch Processing
7.3 Power Procedures
7.4 Routine Checks

–

2.10 Environmental Differences
2.11 Tools

–

•

–
–
–
–

•

6.3 Health Checks
6.4 Other Messages

2.9.1 Hot or Peak Periods
2.9.2 Warm Periods
2.9.3 Cool or Quiet Periods

–
–

•
•

–
–

9.1 Failover
9.2 Recovery
9.3 Troubleshooting Failover and Recovery

10 Contact Details
#unidevops

Example
•
•

1 Table of Contents
2 System Overview

– 2.1 Service Overview
– 2.2 Contributing Applications,
Daemons, and Windows
Services
– 2.3 Hours of Operation
– 2.4 Execution Design
– 2.5 Infrastructure and Network
Design
– 2.6 Resilience, Fault Tolerance
and High-Availability
– 2.7 Throttling and Partial
Shutdown
– 2.8 Required Resources
– 2.9 Expected Traffic and Load

•
•
•
•
•
•
•
•

3 Security and Access
Control
4 System Configuration
5 System Backup and
Restore
6 Monitoring and Alerting
7 Operational Tasks
8 Maintenance Tasks
9 Failure and Recovery
Procedures
10 Contact Details
#unidevops

Templates
#unidevops

Focus on Collaboration
Gene Kim:
http://itrevolution.com/the-three-ways-principles-underpinning-devops/

#unidevops

Feedback Loops
•
•
•
•

Focus on the collaboration
Run book is a means, not an end
Throw it away when complete (?)
Aim to automate more over time

• See http://runbookcollab.info/

#unidevops

Run Book as Collaboration
#unidevops

Making Operability
Work
#unidevops

“Non-Functional”
Features

#unidevops

Operational Features
#unidevops

“I just want to write code”
#unidevops

Mysterious Coding Tricks
#unidevops

On-call for Responsibility
•
•
•
•
•

Operational Features, not “NFRs”
Sustainable collaboration
Sensible, fair on-call rotas
Over-compensate in time off
Avoid burn-out

#unidevops

The operability of operability
#unidevops

What’s Next?
• Patterns for
Performance and
Operability
– Ford, Gileadi, Purba,
Moerman

• http://whoownsmyoperability.com/
– Recommended reading lists

#unidevops

Further Reading
• Software Operability – How to make
software work well in Production
– Due early 2014

• Sign up at OperabilityBook.com
• Discount code for DevOps Summit
attendees

#unidevops

Operability Book
• A hands-on workshop for DevOps
culture
• Forthcoming dates:
– Amsterdam: 15 November 2013
– Bangalore: December 2013
– London: February 2014 (tbc)

• http://experiencedevops.org/

#unidevops

Experience DevOps
#unidevops

Matthew Skelton
@matthewpskelton

Questions &
Discussion
softwareoperability.com
operabilitybook.com
#unidevops

http://www.danatronics.com/s
db_apps.html

Acknowledgements

http://www.guavaworks.com/company
-blog/guava-doesnt-do-cookiecutter.html
http://www.carpages.co.uk/ford/fordsand-sculptures-05-09-11.asp
http://paranoidnews.org/wpcontent/uploads/2010/10/Alien-HuntAlarm-Clock.jpg
http://particulations.blogspot.co.uk/
2010/08/headingley-hole.html
http://marvel.wikia.com/
Stephen_Strange_(Earth-616)
#unidevops

Further Slides
•
•
•
•
•

Continuous Delivery
Tuesday 8th April 2014
London, UK
http://pipelineconf.info/
@PipelineConf

#unidevops

PIPELINE Conference

Software operability and run book collaboration - DevOps Summit, Amsterdam

Editor's Notes

  • #3 How Run Book Collaboration can help communication between Dev and Ops, especially for existing/legacy systems
  • #4 Since 2011, I have been the Build & Deployment Architect at thetrainline.com, the UK’s busiest travel booking website.Speaking regularly at conferences