Serverless
DevOps Lessons Learned From Production
About Me - Steve Hogg
● Past - Founder of Web Drive Hosting Company
○ I ran the Operations team for shared hosting platforms with 20,000+ websites circa 2003
○ I ran large scale migrations of customer platforms to cloud based servers
● Now - AWS Contractor
○ Built many Serverless platforms over the last 2 years with IoT Labs and Blockchain Labs
○ Honed my Dev skills with TDD, Integration, Functional testing etc
Serverless - DevOps Lesson Learned
This presentation:
● Concepts - Abstract Ideas
● Demos - Those ideas in-action
What is Serverless?
What is Serverless
● Services where you can pretend that servers don’t exist
○ They do, but you’re not responsible for them
● Not just Functions-as-a-Service (FaaS)
○ Storage
○ Databases
■ DynamoDB
■ Athena
○ Queues
○ API services like API Gateway and AppSync
○ 3rd party services
Some Typical Uses
● Functions called directly
● Worker processes that read from queues
● The backend for APIs
● Single Page Apps
● Functions that customise cloud services
○ Pre and post event triggers
● Often used in combination with third-party services e.g. Auth0
Resources
Things I got started with
The “Evolution” of Application Platforms
https://www.youtube.com/watch?v=oE5lrNn7bAg - Yochay Kiriaty
Serverless Security
https://www.youtube.com/watch?v=CiyUD_rI8D8
Dependency vulnerability analysis: https://snyk.io/
Serverless - NoOps Is A Myth
“The cost and pain of developing software is approximately zero compared to the
operational cost of maintaining it over time.” - Charity Majors
https://www.youtube.com/watch?v=hG39tB5qqMc
● NoOps is a myth, Operations has just changed focus.
○ Serverless === MoreOps; you give up observability e.g. you can’t strace
○ The system still needs Operations
■ You need logging, monitoring, backups, triage (e.g. slow queries), pipelines
○ 10 to 30% of your budget should be on observability tools
○ Plan for failure e.g. fail-safe queues between services
Lambda Demos
Demos
Simple Lambda Examples
Serverless State Machines
State Machine Services - e.g. Step Functions
Demos 2
State Machines Examples
Lessons Learned
Lessons Learned - How Do You Eat An Elephant?
● Minimise the size of the learning pit: get started with low-hanging fruit
○ Worker jobs that get triggered with events are a good choice
■ Run on a timer - e.g. cron
■ Run when an image is uploaded
■ Process messages in a queue
The Learning Pit - Dr John Edwards
Lessons Learned - 1 - Getting Started
● IaC is great; deploy examples and learn from them
○ https://github.com/awslabs/serverless-application-model/tree/master/examples/apps
● Embrace the concept of time-to-value
○ Have a bias towards efficiency - “What is the smallest amount of work I can do to get this
working?”
○ Not due to laziness, but in the interest of getting a great result quickly
○ This makes you look for existing services you can compose into what you need
○ This makes you “sharpen your tools”, to learn more efficient ways of working
○ Be result focussed and take charge of the fundamental attribution error
■ We judge ourselves on our intentions, everyone else judges us on the result
■ The only powerful position we can take to to judge ourselves on the result to
● Have a Serverless-first approach
○ Put effort into coming up with a Serverless solution, as the up-front effort pays off
Lessons Learned - 2 - Getting Started
● Avoid premature optimisation
● Build things in multiple passes, adding layers of sophistication as you go
● It is OK to start building things with the console first before moving to IaC *
● It is OK to deploy IaC manually first before automating it with a pipeline *
● It is OK to prototype things without TDD when you only have a fuzzy idea *
● It is OK to have low-coverage integration tests at first *
● * Communication is critical. The whole team needs to buy into the per-service
level of engineering being applied, and that refactoring is natural part of the
process.
Lesson Learned - 2 continued - Agreed Levels
Define and agree on the per-service level of engineering being targeted i.e. quality
Beginner Novice Competent Proficient Expert
Level 1 - Prototype Level 2 - Beta Level 3 - Small Scale Level 4 Level 5
- TDD + TDD (50%) + TDD (80%)
- Integration Tests
- Smoke Tests
- Functional Tests
+ Continuous
Integration
- Continuous
Delivery
- Continuous
Deployment
+ Infrastructure-as-
Code
- Feature flags
- Software publish
as modules
+ TDD (90%)
+ Integration Tests
+ Smoke Tests
+ Functional Tests
+ Continuous
Integration
+ Continuous
Delivery
- Continuous
Deployment
+ Infrastructure-as-
Code
- Feature flags
- Software publish
as modules
+ TDD (90%)
+ Integration Tests
+ Smoke Tests
+ Functional Tests
+ Continuous
Integration
+ Continuous
Delivery
+ Continuous
Deployment
+ Infrastructure-as-
Code
+ Feature flags
+ Software publish
as modules
Lessons Learned - 3
● Deploy. Lots.
○ The whole system is defined as a IaC template, which is easy to deploy.
○ You only pay for resources when you use them.
○ Allows you to create multiple environments like Dev, Staging, UAT, Prod.
○ You can spin up an environment, run integration tests and a report, then tear it all down.
○ A developer can deploy their own copy to get familiar with the system.
○ You can start up a new environment for a customer demo quickly
Lessons Learned - 4
● You can use multiple accounts as an alternative to multi-tenancy
○ Customers can pay for their own resources
○ Customers are responsible for their own Operations (backups, account security etc.)
● Use multiple accounts as a permission boundary
○ Lock down sensitive data, Lambdas and state machines into their own account, and use
cross-account permissions to allow another named account to use them.
Lessons Learned - 5
● Version your templates
○ It is useful to have a parameter in the template that has the version number.
○ This helps when you have multiple deployments
○ It can be given to the functions too via environment variables e.g. read the version number
from package.json and include it as a param
Lessons Learned - 6
● Have do and undo actions e.g. Saga Pattern
○ https://theburningmonk.com/2017/07/applying-the-saga-pattern-with-aws-lambda-and-step-functions/
Lessons Learned - 7 - Use Fail-Safe Queues
Lessons Learned - 8
● Pub/Sub is a nice way to decouple services
○ https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/
Lessons Learned - 9
● Allow resume from any state
○ https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/
Lessons Learned - 10
● Make your state machines idempotent
○ When a state machine fails due to a transient error, e.g. a third-party service is down
○ If the state machine stages are idempotent, you can “copy and paste” the output for the failed
state machine into a new one that will pick-up where the last one left off.
Lessons Learned - 11 - Compose Services
● https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/
Serverless Application
Services
AWS AppSync and AWS Amplify
● AWS AppSync is a GraphQL API service with integrations with Lambda,
DynamoDB, ElasticSearch etc.
● AWS Amplify is an application framework and SDK for the rapid creation of
applications. It works with API GateWay and/or AWS AppSync
Demos 3
Amplify and AppSync
Serverless and IaC
My thoughts on DevOps
Serverless and Infrastructure-as-Code are a great fit
The whole system: code and infrastructure can be defined in set of templates,
allowing simple CI/CD
https://github.com/aws-samples/codepipeline-nested-cfn/blob/master/codepipeline-cfn-codebuild.yml
DevOps - Responsibility
Who is responsible for this mess? Who’s fault is this?
● The word responsible and fault are often used interchangeably
● “Fault” looks to attribute blame
● Taking “Responsible” back to its origin, it means able to respond
● Developers used to just be responsible for code, Operations used to be just
responsible for running code
● DevOps culture aims to create shared responsibility: reliable software running
in production
● DevSecOps aims for secure, reliable software running in production
● Mark Schwartz talks about DevSecFinBizOps
DevOps - Responsibility Example
A practical example of DevSecFinBizOps using Serverless and IaC
DevOps - Responsibility - Future Focus (R)
● The New Zealand Olympic Team do not do a review of their performance at
the previous games.
● They do a preview of the next Olympic Games.
○ Data from the previous games will naturally be introduced during the preview, but not for its
own sake.
○ This avoids getting bogged down in the current reality, instead building excitement for the
future.
● Similar to SRE Blameless “Postmortem”
The End
Questions?
My Details
● blog.h4.nz
● steve@h4.nz
● https://www.linkedin.com/in/steve-hogg/

Serverless - DevOps Lessons Learned From Production

  • 1.
  • 2.
    About Me -Steve Hogg ● Past - Founder of Web Drive Hosting Company ○ I ran the Operations team for shared hosting platforms with 20,000+ websites circa 2003 ○ I ran large scale migrations of customer platforms to cloud based servers ● Now - AWS Contractor ○ Built many Serverless platforms over the last 2 years with IoT Labs and Blockchain Labs ○ Honed my Dev skills with TDD, Integration, Functional testing etc
  • 3.
    Serverless - DevOpsLesson Learned This presentation: ● Concepts - Abstract Ideas ● Demos - Those ideas in-action
  • 4.
  • 5.
    What is Serverless ●Services where you can pretend that servers don’t exist ○ They do, but you’re not responsible for them ● Not just Functions-as-a-Service (FaaS) ○ Storage ○ Databases ■ DynamoDB ■ Athena ○ Queues ○ API services like API Gateway and AppSync ○ 3rd party services
  • 6.
    Some Typical Uses ●Functions called directly ● Worker processes that read from queues ● The backend for APIs ● Single Page Apps ● Functions that customise cloud services ○ Pre and post event triggers ● Often used in combination with third-party services e.g. Auth0
  • 7.
  • 8.
    The “Evolution” ofApplication Platforms https://www.youtube.com/watch?v=oE5lrNn7bAg - Yochay Kiriaty
  • 9.
  • 10.
    Serverless - NoOpsIs A Myth “The cost and pain of developing software is approximately zero compared to the operational cost of maintaining it over time.” - Charity Majors https://www.youtube.com/watch?v=hG39tB5qqMc ● NoOps is a myth, Operations has just changed focus. ○ Serverless === MoreOps; you give up observability e.g. you can’t strace ○ The system still needs Operations ■ You need logging, monitoring, backups, triage (e.g. slow queries), pipelines ○ 10 to 30% of your budget should be on observability tools ○ Plan for failure e.g. fail-safe queues between services
  • 11.
  • 12.
  • 13.
  • 14.
    State Machine Services- e.g. Step Functions
  • 15.
  • 16.
  • 17.
    Lessons Learned -How Do You Eat An Elephant? ● Minimise the size of the learning pit: get started with low-hanging fruit ○ Worker jobs that get triggered with events are a good choice ■ Run on a timer - e.g. cron ■ Run when an image is uploaded ■ Process messages in a queue The Learning Pit - Dr John Edwards
  • 18.
    Lessons Learned -1 - Getting Started ● IaC is great; deploy examples and learn from them ○ https://github.com/awslabs/serverless-application-model/tree/master/examples/apps ● Embrace the concept of time-to-value ○ Have a bias towards efficiency - “What is the smallest amount of work I can do to get this working?” ○ Not due to laziness, but in the interest of getting a great result quickly ○ This makes you look for existing services you can compose into what you need ○ This makes you “sharpen your tools”, to learn more efficient ways of working ○ Be result focussed and take charge of the fundamental attribution error ■ We judge ourselves on our intentions, everyone else judges us on the result ■ The only powerful position we can take to to judge ourselves on the result to ● Have a Serverless-first approach ○ Put effort into coming up with a Serverless solution, as the up-front effort pays off
  • 19.
    Lessons Learned -2 - Getting Started ● Avoid premature optimisation ● Build things in multiple passes, adding layers of sophistication as you go ● It is OK to start building things with the console first before moving to IaC * ● It is OK to deploy IaC manually first before automating it with a pipeline * ● It is OK to prototype things without TDD when you only have a fuzzy idea * ● It is OK to have low-coverage integration tests at first * ● * Communication is critical. The whole team needs to buy into the per-service level of engineering being applied, and that refactoring is natural part of the process.
  • 20.
    Lesson Learned -2 continued - Agreed Levels Define and agree on the per-service level of engineering being targeted i.e. quality Beginner Novice Competent Proficient Expert Level 1 - Prototype Level 2 - Beta Level 3 - Small Scale Level 4 Level 5 - TDD + TDD (50%) + TDD (80%) - Integration Tests - Smoke Tests - Functional Tests + Continuous Integration - Continuous Delivery - Continuous Deployment + Infrastructure-as- Code - Feature flags - Software publish as modules + TDD (90%) + Integration Tests + Smoke Tests + Functional Tests + Continuous Integration + Continuous Delivery - Continuous Deployment + Infrastructure-as- Code - Feature flags - Software publish as modules + TDD (90%) + Integration Tests + Smoke Tests + Functional Tests + Continuous Integration + Continuous Delivery + Continuous Deployment + Infrastructure-as- Code + Feature flags + Software publish as modules
  • 21.
    Lessons Learned -3 ● Deploy. Lots. ○ The whole system is defined as a IaC template, which is easy to deploy. ○ You only pay for resources when you use them. ○ Allows you to create multiple environments like Dev, Staging, UAT, Prod. ○ You can spin up an environment, run integration tests and a report, then tear it all down. ○ A developer can deploy their own copy to get familiar with the system. ○ You can start up a new environment for a customer demo quickly
  • 22.
    Lessons Learned -4 ● You can use multiple accounts as an alternative to multi-tenancy ○ Customers can pay for their own resources ○ Customers are responsible for their own Operations (backups, account security etc.) ● Use multiple accounts as a permission boundary ○ Lock down sensitive data, Lambdas and state machines into their own account, and use cross-account permissions to allow another named account to use them.
  • 23.
    Lessons Learned -5 ● Version your templates ○ It is useful to have a parameter in the template that has the version number. ○ This helps when you have multiple deployments ○ It can be given to the functions too via environment variables e.g. read the version number from package.json and include it as a param
  • 24.
    Lessons Learned -6 ● Have do and undo actions e.g. Saga Pattern ○ https://theburningmonk.com/2017/07/applying-the-saga-pattern-with-aws-lambda-and-step-functions/
  • 25.
    Lessons Learned -7 - Use Fail-Safe Queues
  • 26.
    Lessons Learned -8 ● Pub/Sub is a nice way to decouple services ○ https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/
  • 27.
    Lessons Learned -9 ● Allow resume from any state ○ https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/
  • 28.
    Lessons Learned -10 ● Make your state machines idempotent ○ When a state machine fails due to a transient error, e.g. a third-party service is down ○ If the state machine stages are idempotent, you can “copy and paste” the output for the failed state machine into a new one that will pick-up where the last one left off.
  • 29.
    Lessons Learned -11 - Compose Services ● https://www.jeremydaly.com/serverless-microservice-patterns-for-aws/
  • 30.
  • 31.
    AWS AppSync andAWS Amplify ● AWS AppSync is a GraphQL API service with integrations with Lambda, DynamoDB, ElasticSearch etc. ● AWS Amplify is an application framework and SDK for the rapid creation of applications. It works with API GateWay and/or AWS AppSync
  • 32.
  • 33.
    Serverless and IaC Mythoughts on DevOps
  • 34.
    Serverless and Infrastructure-as-Codeare a great fit The whole system: code and infrastructure can be defined in set of templates, allowing simple CI/CD https://github.com/aws-samples/codepipeline-nested-cfn/blob/master/codepipeline-cfn-codebuild.yml
  • 35.
    DevOps - Responsibility Whois responsible for this mess? Who’s fault is this? ● The word responsible and fault are often used interchangeably ● “Fault” looks to attribute blame ● Taking “Responsible” back to its origin, it means able to respond ● Developers used to just be responsible for code, Operations used to be just responsible for running code ● DevOps culture aims to create shared responsibility: reliable software running in production ● DevSecOps aims for secure, reliable software running in production ● Mark Schwartz talks about DevSecFinBizOps
  • 36.
    DevOps - ResponsibilityExample A practical example of DevSecFinBizOps using Serverless and IaC
  • 37.
    DevOps - Responsibility- Future Focus (R) ● The New Zealand Olympic Team do not do a review of their performance at the previous games. ● They do a preview of the next Olympic Games. ○ Data from the previous games will naturally be introduced during the preview, but not for its own sake. ○ This avoids getting bogged down in the current reality, instead building excitement for the future. ● Similar to SRE Blameless “Postmortem”
  • 38.
  • 39.
    My Details ● blog.h4.nz ●steve@h4.nz ● https://www.linkedin.com/in/steve-hogg/

Editor's Notes

  • #3 Ops side of things, I’ve got plenty of experience with backups, security, human error
  • #7 Cold start problem mitigated by Optimistic UI
  • #9 Function sprawl. e.g. POST, PUT, GET, DELETE for each resource. How does this whole thing work? Function chaining - no visibility of what goes belongs together, no error handling (step functions et. al. helps with this). Functions should support a do and an undo i.e. keep that logic together
  • #10 Attack Surface - Keep your policies granular (per function permissions) Neutral: Start with minimal permissions, and only expand when needed. Moves aspect to better column. Third party: data you’re sharing, how it is shared, who are they?
  • #11 Bad queries still have an impact with Serverless databases
  • #13 0-Lambda -> 1-image-service Copy one image, show logs, delete logs Copy a few images, show logs, delete logs Versioned deployments https://awslabs.github.io/serverless-application-model/safe_lambda_deployments.html X-Ray example. Alternatives: IO Pipe etc.
  • #15 Good for long running workflows, up to 1 year. Complex workflows. Important workflows that need visibility.
  • #16 1-StepFunctions
  • #18 These first few slides are a bit high-level, but reflect my journey with learning new things. Like learning anything new, it can be slow and frustrating at first. This may be obvious I used to hate change and learning new things, but now I embrace it Don’t bite off more than you can chew. You want small achievable goals at first; the low hanging fruit. Worker tasks, e.g. things you run with cron are a good choice to start with
  • #19 Time-to-value applies to all actions you take e.g. use linting, local testing, REPL loops rather than publishing to the cloud to do testing Even though you could roll your own service, the skill is in assembling of various services and functions into a cohesive system
  • #20 Show levels of engineering What is the one thing I need to do? Wrong question, what are the thousand things I need to do? What is the first step?
  • #21 There is a jump from level 2 to level 3 e.g. “throwaway prototype”
  • #22 A focus on time-to-value helps get team buy-in The skill is in assembling of various services and functions into a cohesive system
  • #23 A focus on time-to-value helps get team buy-in The skill is in assembling of various services and functions into a cohesive system
  • #24 A focus on time-to-value helps get team buy-in The skill is in assembling of various services and functions into a cohesive system
  • #26 Nice interface to handover to Operations with: Operations can monitor, redrive messages.
  • #36 The words should be used differently Responsible examples: my child is misbehaving at school, someone falls over in front of me Simple problems in Dev => big workarounds for Ops Shared responsibility: quality service; reliable software running in production DevOps is: Culture, Lean, Automation, Measurement, Sharing