It’s not a thing. It’s not even a good idea. NoOps === Developers doing deployments.
We already have a really good name for this: DevOps. I’ll take two.
There you go. That’s the end of the presentation. We can all go home now. Right? Not quite.
"Being both a developer and an operator is already tough (but critical to build good software),”
What does "DevOps" mean? I bet there's as many definitions as people in this room.
It’s my presentation, I can quote myself.
This version leaves a lot more implications, like applying software development practices to infrastructure, etc. but is nice and simple.
But Serverless Operations definitely isn't the same as "traditional" (i.e. server-based) operations. It's a different beast. Because you've traded-off a bunch of things, you have to get thing things you do have right. That’s what I’m about to go in to.
Or “OpsLess”? Which do you guys prefer?
Some of these things will be very obvious, but in order to understand the implications, we need to consider them together.
I’m going to go through the good – and less good – things about operating serverless applications and workloads.
The reality is most of these systems are complex systems, and so cannot be easily reasoned about just in our heads. By thinking of the underlying principals holistically, we set ourselves up to make informed decisions that can be objectively evaluated.
It’s all virtualised anyway, so must of use in AWS are used to this (anyone used the new tin instances?).
Good: I didn’t want to worry about hardware anyway.
Less good: As much as it pains me to say, serverless is not for all applications.
Meltdown & Spectre! Didn't need to do anything. It’s not my problem. I pay AWS (a very small amount of money) to deal with this, and the reality is they’re much much better at it than I or you ever will be – they had it fixed before it was announced. How many on premise networks do you think are fixed by now? When the CPUs are (maybe!) fixed, I also won't have to do anything.
I don’t even get to choose the OS! Let alone the configuration of it.
Good: Less to do/worry about. The cynic in me thinks that most businesses out there are likely to stuff their configuration up (since servers aren't their core business), so interpret as your ego allows.
Less good: If you do need to change something about the environment, you have to jump through more hoops to get it working (if it’s even possible).
Telemetry: the process of recording and transmitting the readings of an instrument.
The level of visibility you get in to your AWS environment is unprecedented, and there's no good reason NOT to use the data you get. As always, the hard part is turning it in to actionable information that you can use to make confident decisions.
I'm still yet to meet or hear of anyone say "I just can't get enough visibility in to my system" - yes there are things you can't see directly, but you can infer them through the things you can see.
Good: Unprecedented visibility, for no workLess Good: Now you’ve got to use it The onus is now on YOU to use the data. This means you should be making data-driven decisions.
And why should you pay so much attention to optimising your functions?
The AWS free tier for Lambda and DDB doesn’t expire. API GW does.
Good: Free tier doesn’t expireWhat this means is that below a certain level of activity, it’s not worth optimising or improving things – there’s just not cost benefit to spending your attention time.
Less Good: Pressure’s on. This is something where most everyone’s behind as far as I can see. Once the business gets a hold of this, they’re going to get hooked on it. Being able to objectively quantify the cost/benefit of an application to the millisecond level is going to be big, and it’s something most businesses don’t realise is possible. Everyone’s so used to sinking thousands of dollars in to machines, and skilled engineers to configure them, they don’t even imagine they could get this kind of granularity.
Monolith function. Ugh.
The mark of a mature developer is a tendency towards simplicity.
Good: Up-frontLess Good: Can you/your organisation handle it?
If you're organisation can't manage a monolith properly, how do you think introducing increased complexity is going to go?
No amount of Medium articles will prepare you for this.
Related to complexity.
If you’ve legitimately got a scenario where semvar doesn't work, I’d love to hear it.
Use SemVar! It will work fantastically for 99% of your use-cases, and nothing will work for that other 1%. In that case, go with something you’re comfortable with, which should probably be semvar…
I’m guilty of not doing this right too – this is my slide to me.
This is something you have to do in traditional operations, but because it's tide up with the server of your state, it's not a separate thing to do. In serverless, you have to specifically call it out, because you're prevented from storing it on your instance.
It is possible to store state in your functions – and you do it in specific scenarios – but usually you won’t have any there.
I even managed to convince the A Cloud Guru guys to mention this at one of their re:Invent talks last year.
Define idempotent: denoting an element of a set which is unchanged in value when multiplied or otherwise operated on by itself.
The flipside of an event-driven world with distributed systems e.g. AWS Lambda retries events three times if they fail. Lambda retries will happen in the event of memory or timeout issues.
Good: This will make your life sooooo much easier. Architecting for failure.
Less good: It takes some getting used to.
This is how your databases are written (e.g. transaction logs).
Easier to manage when you controlled all of the components of the system. You don’t control them any more, so you don’t get to trace everything end-to-end.
Want to keep your jobs? Be the one giving jobs to the robots.
Good: Who wants to do the same thing over and over again? Less Good: More work up-front
No one spins up 1000 servers and configures them 1000 different ways, unless they’re insane. They use configuration management to configure them ONE way.
I spend a lot of time explaining this to our clients.
Automation is more work up front, no doubt about it. But automation pays dividends every time after that, every time you deploy. The sooner you do it, the sooner you get the benefit, and the better it is.
Segway to CI/CD/CD.
Yes, this is an AWS slide. It’s been in a bunch of talks. Last seen: SRV302 - Building CI/CD Pipelines for Serverless Applications.
CI: It takes discipline to write tests. Most developers already know that they should, they just need to be given permission and time to do it.
CD: Most of the work is around automation. If you haven’t been through this, you’d be surprised on how much work you’ll discover. At least with a serverless application you have as little as possible to manage. Full “automation” is a binary state - If your process has a manual step, it’s not automated.
CD: The Dream. Codifies knowledge, which is great for an organisation. Removes key person risk. Shifts focus from menial tasks (i.e. deployment) to a focus on business tasks and value. That’s not to say there isn’t benefit to automating parts of your deployment, but the real benefit comes when it’s all automated.
Good: Fast feedback Fail fast. Improve developer productivity by giving feedback earlier in the development cycle. Especially crucial in a distributed environment. Yes there’s SAM Local, but for any system of reasonable complexity, you’ll need to deploy to AWS. “Works on my machine” really doesn’t cut it for serverless…
Less Good: Don’t go crazy There’s plenty of examples of people online “mocking the world” locally trying to test their serverless functions, and I’m just not sure it’s worth it. When I think of tests I like to think of return on investment. Yes there are things you could test by mocking your function’s connection to DDB, but that’s probably not the best value in terms of finding bugs (which are more likely to be in your own code), and mimicking the complexity of AWS just sounds like a bad idea…
Less is more! Least privilege, that is. Split your functions by role, give them only the access they need. This goes hand in hand with the microservices approach.
Good: Worry about your code Your biggest attack vector is going to be your (crappy?) code. Less Good: VPC-based functions They’re slow, they’re fiddly, but they’re the only way to isolate (kind of) your functions. Keep in mind IP allocation – while Lambda can scale, the IPs in your subnet cannot.
Embrace the things you can’t control Less to get wrong. You didn't need them anyway. The rest is AWS’ problem.
Make data driven decisions You have so few things to change, make sure you have a reason for chainging them. Have a hypothesis, test it, and act accordingly. You have the data. If you can't justify it with data, then you should probably leave it alone (i.e. let AWS decide)
Automation is your friend Don’t repeat yourself. Unless your project is a toy.
Be clear about where your state is Make sure it’s in a good place e.g. not global variables, etc.
Make it idempotent Takes more work, but you’ll thank yourself. Will help you manage your state properly.
Get versioning right Use semvar.
Make backwards compatible changes Makes complexity manageable. A stitch in time saves nine.
So remember, do less, not more, because less is more. Work smarter, not harder.
So remember, do less, not more, because less is more. Work smarter, not harder.
A cross-disciplinary community
of practice dedicated to the
study of building, evolving and
resilient systems at scale.
The team that writes the
software, deploys and maintains
Don’t automate failure
Good: Fast feedback
Less Good: Don’t go crazy
Less is more
Good: Worry about your code
Less Good: VPC-based functions
• Write tests
• Make it idempotent
• Get versioning right
• Let costs drive choices
• Automation is your friend
• Make data driven decisions
• Be clear about where your state is
• Embrace the things you can’t control
• Make backwards compatible changes