2. About Me - Steve Hogg
● Past - Founder of Web Drive Hosting Company
○ I ran the Operations team for shared hosting platforms with 20,000+ websites circa 2003
○ I ran large scale migrations of customer platforms to cloud based servers
● Now - AWS Contractor
○ Built many Serverless platforms over the last 2 years with IoT Labs and Blockchain Labs
○ Honed my Dev skills with TDD, Integration, Functional testing etc
3. Serverless - DevOps Lesson Learned
This presentation:
● Concepts - Abstract Ideas
● Demos - Those ideas in-action
5. What is Serverless
● Services where you can pretend that servers don’t exist
○ They do, but you’re not responsible for them
● Not just Functions-as-a-Service (FaaS)
○ Storage
○ Databases
■ DynamoDB
■ Athena
○ Queues
○ API services like API Gateway and AppSync
○ 3rd party services
6. Some Typical Uses
● Functions called directly
● Worker processes that read from queues
● The backend for APIs
● Single Page Apps
● Functions that customise cloud services
○ Pre and post event triggers
● Often used in combination with third-party services e.g. Auth0
10. Serverless - NoOps Is A Myth
“The cost and pain of developing software is approximately zero compared to the
operational cost of maintaining it over time.” - Charity Majors
https://www.youtube.com/watch?v=hG39tB5qqMc
● NoOps is a myth, Operations has just changed focus.
○ Serverless === MoreOps; you give up observability e.g. you can’t strace
○ The system still needs Operations
■ You need logging, monitoring, backups, triage (e.g. slow queries), pipelines
○ 10 to 30% of your budget should be on observability tools
○ Plan for failure e.g. fail-safe queues between services
17. Lessons Learned - How Do You Eat An Elephant?
● Minimise the size of the learning pit: get started with low-hanging fruit
○ Worker jobs that get triggered with events are a good choice
■ Run on a timer - e.g. cron
■ Run when an image is uploaded
■ Process messages in a queue
The Learning Pit - Dr John Edwards
18. Lessons Learned - 1 - Getting Started
● IaC is great; deploy examples and learn from them
○ https://github.com/awslabs/serverless-application-model/tree/master/examples/apps
● Embrace the concept of time-to-value
○ Have a bias towards efficiency - “What is the smallest amount of work I can do to get this
working?”
○ Not due to laziness, but in the interest of getting a great result quickly
○ This makes you look for existing services you can compose into what you need
○ This makes you “sharpen your tools”, to learn more efficient ways of working
○ Be result focussed and take charge of the fundamental attribution error
■ We judge ourselves on our intentions, everyone else judges us on the result
■ The only powerful position we can take to to judge ourselves on the result to
● Have a Serverless-first approach
○ Put effort into coming up with a Serverless solution, as the up-front effort pays off
19. Lessons Learned - 2 - Getting Started
● Avoid premature optimisation
● Build things in multiple passes, adding layers of sophistication as you go
● It is OK to start building things with the console first before moving to IaC *
● It is OK to deploy IaC manually first before automating it with a pipeline *
● It is OK to prototype things without TDD when you only have a fuzzy idea *
● It is OK to have low-coverage integration tests at first *
● * Communication is critical. The whole team needs to buy into the per-service
level of engineering being applied, and that refactoring is natural part of the
process.
21. Lessons Learned - 3
● Deploy. Lots.
○ The whole system is defined as a IaC template, which is easy to deploy.
○ You only pay for resources when you use them.
○ Allows you to create multiple environments like Dev, Staging, UAT, Prod.
○ You can spin up an environment, run integration tests and a report, then tear it all down.
○ A developer can deploy their own copy to get familiar with the system.
○ You can start up a new environment for a customer demo quickly
22. Lessons Learned - 4
● You can use multiple accounts as an alternative to multi-tenancy
○ Customers can pay for their own resources
○ Customers are responsible for their own Operations (backups, account security etc.)
● Use multiple accounts as a permission boundary
○ Lock down sensitive data, Lambdas and state machines into their own account, and use
cross-account permissions to allow another named account to use them.
23. Lessons Learned - 5
● Version your templates
○ It is useful to have a parameter in the template that has the version number.
○ This helps when you have multiple deployments
○ It can be given to the functions too via environment variables e.g. read the version number
from package.json and include it as a param
24. Lessons Learned - 6
● Have do and undo actions e.g. Saga Pattern
○ https://theburningmonk.com/2017/07/applying-the-saga-pattern-with-aws-lambda-and-step-functions/
26. Lessons Learned - 8
● Pub/Sub is a nice way to decouple services
○ https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/
27. Lessons Learned - 9
● Allow resume from any state
○ https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/
28. Lessons Learned - 10
● Make your state machines idempotent
○ When a state machine fails due to a transient error, e.g. a third-party service is down
○ If the state machine stages are idempotent, you can “copy and paste” the output for the failed
state machine into a new one that will pick-up where the last one left off.
31. AWS AppSync and AWS Amplify
● AWS AppSync is a GraphQL API service with integrations with Lambda,
DynamoDB, ElasticSearch etc.
● AWS Amplify is an application framework and SDK for the rapid creation of
applications. It works with API GateWay and/or AWS AppSync
34. Serverless and Infrastructure-as-Code are a great fit
The whole system: code and infrastructure can be defined in set of templates,
allowing simple CI/CD
https://github.com/aws-samples/codepipeline-nested-cfn/blob/master/codepipeline-cfn-codebuild.yml
35. DevOps - Responsibility
Who is responsible for this mess? Who’s fault is this?
● The word responsible and fault are often used interchangeably
● “Fault” looks to attribute blame
● Taking “Responsible” back to its origin, it means able to respond
● Developers used to just be responsible for code, Operations used to be just
responsible for running code
● DevOps culture aims to create shared responsibility: reliable software running
in production
● DevSecOps aims for secure, reliable software running in production
● Mark Schwartz talks about DevSecFinBizOps
36. DevOps - Responsibility Example
A practical example of DevSecFinBizOps using Serverless and IaC
37. DevOps - Responsibility - Future Focus (R)
● The New Zealand Olympic Team do not do a review of their performance at
the previous games.
● They do a preview of the next Olympic Games.
○ Data from the previous games will naturally be introduced during the preview, but not for its
own sake.
○ This avoids getting bogged down in the current reality, instead building excitement for the
future.
● Similar to SRE Blameless “Postmortem”
Ops side of things, I’ve got plenty of experience with backups, security, human error
Cold start problem mitigated by Optimistic UI
Function sprawl. e.g. POST, PUT, GET, DELETE for each resource. How does this whole thing work?
Function chaining - no visibility of what goes belongs together, no error handling (step functions et. al. helps with this).
Functions should support a do and an undo i.e. keep that logic together
Attack Surface - Keep your policies granular (per function permissions)
Neutral: Start with minimal permissions, and only expand when needed. Moves aspect to better column.
Third party: data you’re sharing, how it is shared, who are they?
Bad queries still have an impact with Serverless databases
0-Lambda -> 1-image-service
Copy one image, show logs, delete logs
Copy a few images, show logs, delete logs
Versioned deployments
https://awslabs.github.io/serverless-application-model/safe_lambda_deployments.html
X-Ray example. Alternatives: IO Pipe etc.
Good for long running workflows, up to 1 year. Complex workflows. Important workflows that need visibility.
1-StepFunctions
These first few slides are a bit high-level, but reflect my journey with learning new things.
Like learning anything new, it can be slow and frustrating at first.
This may be obvious
I used to hate change and learning new things, but now I embrace it
Don’t bite off more than you can chew. You want small achievable goals at first; the low hanging fruit.
Worker tasks, e.g. things you run with cron are a good choice to start with
Time-to-value applies to all actions you take
e.g. use linting, local testing, REPL loops rather than publishing to the cloud to do testing
Even though you could roll your own service, the skill is in assembling of various services and functions into a cohesive system
Show levels of engineering
What is the one thing I need to do? Wrong question, what are the thousand things I need to do? What is the first step?
There is a jump from level 2 to level 3 e.g. “throwaway prototype”
A focus on time-to-value helps get team buy-in
The skill is in assembling of various services and functions into a cohesive system
A focus on time-to-value helps get team buy-in
The skill is in assembling of various services and functions into a cohesive system
A focus on time-to-value helps get team buy-in
The skill is in assembling of various services and functions into a cohesive system
Nice interface to handover to Operations with: Operations can monitor, redrive messages.
The words should be used differently
Responsible examples: my child is misbehaving at school, someone falls over in front of me
Simple problems in Dev => big workarounds for Ops
Shared responsibility: quality service; reliable software running in production
DevOps is: Culture, Lean, Automation, Measurement, Sharing