DevOps is an approach that aims to bring development and operations teams together to work on delivery of products and services to end users. It involves teams taking ownership of the full development lifecycle from code changes to monitoring. Key aspects of DevOps include source control, continuous integration, testing, deployment automation, infrastructure management, and monitoring tools. Resources for learning more about DevOps practices include industry conferences and online materials.
DevOps is a tricky subject to nail down, as each company does interpret it their own way.
It is culture. It is process. It is architecture. It is being lean and scientific. It is being cross-functional. It is alignment to a greater goal. It is continuous improvement to the extreme.
Some hire people with a DevOps title. This can really be a way to entice operations people into thinking you have a great culture.
Some companies have DevOps teams and non-DevOps teams. This is likely another way to relabel your operations or deployment focused teams, not build cross-functional teams.
General idea of how software has been built over the years.Some people have a meeting and come up with this great new idea for a great new piece of software
It then goes to a product team where they do their magic and design features, functionality, and appearance
From there it gets into the developers hands where all the real magic is done, a bunch of code is written, this could be in waterfall or agile way
It’s then shipped to the Ops folks for deployment after the dev team meets all their checks and balances that have been put in place from previous failures
Aaaand something breaks, so it is sent back to dev usually with a “your app is broken, fix it”. Dev says “well you deployed it wrong”.
And that’s where DevOps comes in, wouldn’t it be nice as a dev to know before everyone else that something is broken, wouldn’t it be nice to control your deployments so that you don’t have to rely on others and take back control
And as an Ops person, wouldn’t it be nice to not worry about some janky code that some self-entitled developer has written?
This has changed the way we build applications
Before we would work with business to gather requirements, do some architecture planning, and if we were feeling really ambitious maybe talk about how to test and security
Now we still go through those very important steps but there is a new layer of items
how will this be deployed
what does its infrastructure look like, is it cloud, serverless or classic on prem
how are we going to log, errors, usage, and other statistics to understand health
how are we going to monitor
This wasn’t an immediate transition, it’s something that we have been working on for close to 5 years and continue to work on.
So where did we start?
It took commitment from day one, from individuals and from leadership. We were lucky to have leaders like Jason that empower us to do these things, and I know some people aren’t as fortunate, but there are things that we started with that I believe anyone can start with
First, source control: Seems strange to bring this up, but I have seen it recently where it isn’t in use across the board. I like the simple rule that if it’s deployed somewhere then it should be in a repository of some sort We use GitHub primarily because of corporate standards, but it also provides many great integrations for us
Once we had source control all set up we could start working on builds for our code, and having those builds trigger automatically on commits to that source code.
Then we really started looking at our testing. The idea here is that makes sure we have tests that are useful, if you write new code write a test, if you fix a bug ensure there’s a test for that scenario, and have your tests run as part of the build. The caveat here is that some integration or E2E tests can run long so considering when they run might be necessary.
Then after that we could start looking at our deployments. I’ve been in many scenarios where deployments are a dev creating a package and copy and paste it to a server. Most of that can be automated and create consistency and confidence in deployments. Deployment automation really brought in a new level of ownership and confidence to what we were building and deploying. At the time we were heavily involved with database code, and as most people know deploying databases can be difficult but taking the time to understand how to deploy safely and consistently really relieved a lot of stress from the team.
So now we had some process automation and control around how we build and deployed code, but what about the health of our applications while their running, how do we know if something is unhealthy before the customer calls? We started with centralizing error logging, most applications use a similar pattern to log errors to a single source. This is especially useful when you have workflows that go through multiple applications or services.
Also other event logging, something that is immensely useful to providing value is understanding how applications are being used.
Both of these enabled us to create dashboarding and alerting for things, which also meant that instead of Ops we started getting woken up. Yes on-call is part of ownership, on-call is a whole other discussion.
We also started thinking about infrastructure. We were on VM’s but these were on blades in a data centre, we also had a big move to the cloud while this was happening so we started taking advantage of cloud based architecture and started using IaC
We do feel like we’ve made some great progress, but we’re far from done, so what else could we be doing
Zero downtime deployments. We still have systems that require downtime when deploying. Some of this it out of our control, but this is always the goal.
More wide use of feature flags. This is something that is closely coupled zero downtime deploys, but something that can also help provide great visibility into how new features will work. Plug for TravisToronto Enterprise DevOps User Group on May 14 @ 5:30 PM ESTNorth Toronto Cloud and DevOps User Group on May 22 @ 10:30 AM EST
More ownership. Some of our applications still have a dependency on Ops or Application Support teams. We’re not trying to put them out of a job, we’re trying to make their lives more reasonable so that they aren’t getting woken up for silly things that we’re doing with our code.
The most important tool we use is continuous learning. We have things that we started with that we have walked away from, or learned that it’s maybe not the best answer for every scenario. Constant improvement.