2. What the heck is DevOps?
And why should I care?
And oh god the buzzwords we’re transforming digital culture through disruptive
innovation please stahhp
I’m going to start with some background about how things used to be (and tragically still are, in some parts)
Devs prioritise features
Ops prioritise stability
Sec advocates for unplugging the whole darn thing
Business wants features AND stability AND security - doesn’t know who to believe
What’s the result of this?
No-one gets what they want. Least of all business.
So what’s the solution to this?
Many different definitions, but I like the simplicity of CAMS. (Or CLAMS, if you want to include Lean and like saying CLAMS)
‘Culture’ gets bandied around a lot. I think it’s about feeling safe to be yourself at work. Knowing your boss and your team-mates have your back. And being the kind of team-mate that people want to be grouped with - assume the best of people, try to be egoless, have strong opinions but hold them weakly.
Everything about your application should be in code and version controlled - the app, the config, the infrastructure, the tests, the pipeline. That way, we reduce errors. No more ‘my app worked in dev why doesn’t it work in prod’. Plus, it’s all immediately deployable. Some of our full production stacks can be prod deployed in 3 minutes.
Use science to prove you’re doing the right thing.
Simple level - where is the manual process taking the longest time?
Advanced - A/B testing, setting pre-conditions for chaos engineering
We had an incident recently - the postmortem was entirely blameless, we learned from the mistake rather than sweeping it under the rug, we’re making sure we don’t make the same mistake again
That’s a lovely story, James
What happened in real life?
Start with the good stuff, move on to the challenges
Manually putting a war file on a server
The worst part - it failed under high load. The times when you most wanted it to be available - big bushfire events - it would collapse under the pressure.
Full disclosure - I was playing deep right field for a lot of this work. I had only just joined GA and my contribution was pretty minimal. So a big shout out to the teammates that made this happen.
We automated the whole deployment pipeline. That meant everything in infracode - the static website, the machine images, the networking, the deployment pipeline itself.
Automated machine image in Packer
Executes bootstrap.sh
Terraform definition of the infrastructure. Just ‘terraform apply’ and you have a whole system.
And here’s the deployment pipeline.
So what’s the point of all this work? Sure, we had the uptime
Deploys now take 3 minutes and can be done by one person (with a PR) - the time between new work arriving (new feature, security change, bug fix) and the work being in production is massively reduced
Blockers: Security, Architects, lovers of governance, “traditional thinkers”
Buy-in (and continued buy-in) from management - esp. Middle management
Momentum: when the prime drivers leave, how do you keep driving? How to align incentives?
Attribution bias, no respect for conservatism
My desire to get shit done has burned some people
How do you maintain inclusivity when including some people will paralyse the whole project?
Attribution bias, bias about capability, checking privilege
My need to get shit done has burned some people