Serverless - DevOps Lessons Learned From Production

Serverless
DevOps Lessons Learned From Production

About Me - Steve Hogg
● Past - Founder of Web Drive Hosting Company
○ I ran the Operations team for shared hosting platforms with 20,000+ websites circa 2003
○ I ran large scale migrations of customer platforms to cloud based servers
● Now - AWS Contractor
○ Built many Serverless platforms over the last 2 years with IoT Labs and Blockchain Labs
○ Honed my Dev skills with TDD, Integration, Functional testing etc

Serverless - DevOps Lesson Learned
This presentation:
● Concepts - Abstract Ideas
● Demos - Those ideas in-action

What is Serverless
● Services where you can pretend that servers don’t exist
○ They do, but you’re not responsible for them
● Not just Functions-as-a-Service (FaaS)
○ Storage
○ Databases
■ DynamoDB
■ Athena
○ Queues
○ API services like API Gateway and AppSync
○ 3rd party services

Some Typical Uses
● Functions called directly
● Worker processes that read from queues
● The backend for APIs
● Single Page Apps
● Functions that customise cloud services
○ Pre and post event triggers
● Often used in combination with third-party services e.g. Auth0

Resources
Things I got started with

The “Evolution” of Application Platforms
https://www.youtube.com/watch?v=oE5lrNn7bAg - Yochay Kiriaty

Serverless Security
https://www.youtube.com/watch?v=CiyUD_rI8D8
Dependency vulnerability analysis: https://snyk.io/

Serverless - NoOps Is A Myth
“The cost and pain of developing software is approximately zero compared to the
operational cost of maintaining it over time.” - Charity Majors
https://www.youtube.com/watch?v=hG39tB5qqMc
● NoOps is a myth, Operations has just changed focus.
○ Serverless === MoreOps; you give up observability e.g. you can’t strace
○ The system still needs Operations
■ You need logging, monitoring, backups, triage (e.g. slow queries), pipelines
○ 10 to 30% of your budget should be on observability tools
○ Plan for failure e.g. fail-safe queues between services

State Machine Services - e.g. Step Functions

Demos 2
State Machines Examples

Lessons Learned - How Do You Eat An Elephant?
● Minimise the size of the learning pit: get started with low-hanging fruit
○ Worker jobs that get triggered with events are a good choice
■ Run on a timer - e.g. cron
■ Run when an image is uploaded
■ Process messages in a queue
The Learning Pit - Dr John Edwards

Lessons Learned - 1 - Getting Started
● IaC is great; deploy examples and learn from them
○ https://github.com/awslabs/serverless-application-model/tree/master/examples/apps
● Embrace the concept of time-to-value
○ Have a bias towards efficiency - “What is the smallest amount of work I can do to get this
working?”
○ Not due to laziness, but in the interest of getting a great result quickly
○ This makes you look for existing services you can compose into what you need
○ This makes you “sharpen your tools”, to learn more efficient ways of working
○ Be result focussed and take charge of the fundamental attribution error
■ We judge ourselves on our intentions, everyone else judges us on the result
■ The only powerful position we can take to to judge ourselves on the result to
● Have a Serverless-first approach
○ Put effort into coming up with a Serverless solution, as the up-front effort pays off

Lessons Learned - 2 - Getting Started
● Avoid premature optimisation
● Build things in multiple passes, adding layers of sophistication as you go
● It is OK to start building things with the console first before moving to IaC *
● It is OK to deploy IaC manually first before automating it with a pipeline *
● It is OK to prototype things without TDD when you only have a fuzzy idea *
● It is OK to have low-coverage integration tests at first *
● * Communication is critical. The whole team needs to buy into the per-service
level of engineering being applied, and that refactoring is natural part of the
process.

Lesson Learned - 2 continued - Agreed Levels
Define and agree on the per-service level of engineering being targeted i.e. quality
Beginner Novice Competent Proficient Expert
Level 1 - Prototype Level 2 - Beta Level 3 - Small Scale Level 4 Level 5
- TDD + TDD (50%) + TDD (80%)
- Integration Tests
- Smoke Tests
- Functional Tests
+ Continuous
Integration
- Continuous
Delivery
- Continuous
Deployment
+ Infrastructure-as-
Code
- Feature flags
- Software publish
as modules
+ TDD (90%)
+ Integration Tests
+ Smoke Tests
+ Functional Tests
+ Continuous
Integration
+ Continuous
Delivery
- Continuous
Deployment
Code
- Feature flags
- Software publish
as modules
+ TDD (90%)
+ Integration Tests
+ Smoke Tests
+ Functional Tests
+ Continuous
Integration
+ Continuous
Delivery
+ Continuous
Deployment
Code
+ Feature flags
+ Software publish
as modules

Lessons Learned - 3
● Deploy. Lots.
○ The whole system is defined as a IaC template, which is easy to deploy.
○ You only pay for resources when you use them.
○ Allows you to create multiple environments like Dev, Staging, UAT, Prod.
○ You can spin up an environment, run integration tests and a report, then tear it all down.
○ A developer can deploy their own copy to get familiar with the system.
○ You can start up a new environment for a customer demo quickly

Lessons Learned - 4
● You can use multiple accounts as an alternative to multi-tenancy
○ Customers can pay for their own resources
○ Customers are responsible for their own Operations (backups, account security etc.)
● Use multiple accounts as a permission boundary
○ Lock down sensitive data, Lambdas and state machines into their own account, and use
cross-account permissions to allow another named account to use them.

Lessons Learned - 5
● Version your templates
○ It is useful to have a parameter in the template that has the version number.
○ This helps when you have multiple deployments
○ It can be given to the functions too via environment variables e.g. read the version number
from package.json and include it as a param

Lessons Learned - 6
● Have do and undo actions e.g. Saga Pattern
○ https://theburningmonk.com/2017/07/applying-the-saga-pattern-with-aws-lambda-and-step-functions/

Lessons Learned - 7 - Use Fail-Safe Queues

Lessons Learned - 8
● Pub/Sub is a nice way to decouple services
○ https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/

Lessons Learned - 9
● Allow resume from any state
○ https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/

Lessons Learned - 10
● Make your state machines idempotent
○ When a state machine fails due to a transient error, e.g. a third-party service is down
○ If the state machine stages are idempotent, you can “copy and paste” the output for the failed
state machine into a new one that will pick-up where the last one left off.

Lessons Learned - 11 - Compose Services
● https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/

Serverless Application
Services

AWS AppSync and AWS Amplify
● AWS AppSync is a GraphQL API service with integrations with Lambda,
DynamoDB, ElasticSearch etc.
● AWS Amplify is an application framework and SDK for the rapid creation of
applications. It works with API GateWay and/or AWS AppSync

Serverless and IaC
My thoughts on DevOps

Serverless and Infrastructure-as-Code are a great fit
The whole system: code and infrastructure can be defined in set of templates,
allowing simple CI/CD
https://github.com/aws-samples/codepipeline-nested-cfn/blob/master/codepipeline-cfn-codebuild.yml

DevOps - Responsibility
Who is responsible for this mess? Who’s fault is this?
● The word responsible and fault are often used interchangeably
● “Fault” looks to attribute blame
● Taking “Responsible” back to its origin, it means able to respond
● Developers used to just be responsible for code, Operations used to be just
responsible for running code
● DevOps culture aims to create shared responsibility: reliable software running
in production
● DevSecOps aims for secure, reliable software running in production
● Mark Schwartz talks about DevSecFinBizOps

DevOps - Responsibility Example
A practical example of DevSecFinBizOps using Serverless and IaC

DevOps - Responsibility - Future Focus (R)
● The New Zealand Olympic Team do not do a review of their performance at
the previous games.
● They do a preview of the next Olympic Games.
○ Data from the previous games will naturally be introduced during the preview, but not for its
own sake.
○ This avoids getting bogged down in the current reality, instead building excitement for the
future.
● Similar to SRE Blameless “Postmortem”

My Details
● blog.h4.nz
● steve@h4.nz
● https://www.linkedin.com/in/steve-hogg/

Serverless - DevOps Lessons Learned From Production

More Related Content

What's hot

Similar to Serverless - DevOps Lessons Learned From Production

Recently uploaded

Serverless - DevOps Lessons Learned From Production

Editor's Notes