Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Beware the potholes on the road to serverless

438 views

Published on

Looking in from the outside, serverless seems so simple! And yet, many companies are struggling on their journey to serverless. In this talk, I highlight a number of mistakes companies are making when they adopt serverless.

Published in: Technology
  • Be the first to comment

Beware the potholes on the road to serverless

  1. 1. MIND THE POTHOLES MIND THE POTHOLES
  2. 2. What do you mean by ‘serverless’?
  3. 3. “Serverless”
  4. 4. Gojko Adzic It is serverless the same way WiFi is wireless. http://bit.ly/2yQgwwb
  5. 5. Serverless means… don’t pay for it if no-one uses it don’t need to worry about scaling don’t need to provision and manage servers
  6. 6. in other words, it’s a lot like taking a cab
  7. 7. Ownership Fuel Navigate To get there! Focus on getting there!
  8. 8. HW Ownership OS Runtime & Scale Code Focus on getting there! Physical Servers Virtual Machines Containers Serverless
  9. 9. Nano Services Self Managed Cost Paradigm ChangeAsync Dynamic agile env
  10. 10. “why are we failing at this?”
  11. 11. hidden dangers
  12. 12. monolith microservices serverless
  13. 13. monolith microservices serverless observability distributed systems bounded context
  14. 14. monolith microservices serverless observability distributed systems bounded context
  15. 15. monolith microservices serverless observability distributed systems bounded context event driven
  16. 16. monolith serverless missing learnings from microservices
  17. 17. monolith serverless missing learnings from microservices poor decisions
  18. 18. Yan Cui http://theburningmonk.com @theburningmonk AWS user for 10 years
  19. 19. http://bit.ly/yubl-serverless
  20. 20. Yan Cui http://theburningmonk.com @theburningmonk Developer Advocate @
  21. 21. Yan Cui http://theburningmonk.com @theburningmonk Independent Consultant advisetraining delivery
  22. 22. https://theburningmonk.com/workshops Amsterdam, March 19-20 Helsinki, May 4-5 Stockholm, May 14-15 Dublin, June 16-17 London, September 24-25 early bird until Feb 11
  23. 23. #1 not letting go of legacy thinking
  24. 24. “we’re doing serverless, but why aren’t thing going faster?”
  25. 25. Socio Technical
  26. 26. there are no silver bullets
  27. 27. centralised team Team A Team B Team C Team D …
  28. 28. “but the developers don’t understand AWS and how our infrastructure is set up”
  29. 29. “but the developers don’t understand AWS and how our infrastructure is set up” let’s solve this problem instead!
  30. 30. what got you here won’t get you there
  31. 31. if (path == “/user” && method == “GET”) { return getUser(…); } else if (path == “/user” && method == “DELETE”) { return deleteUser(…); } else if (path == “/user” && method == “POST”) { return createUser(…); } else if …. Monolithic Functions
  32. 32. GET /user POST /user DELETE /user Single-Purposed Functions
  33. 33. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user
  34. 34. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user find related functions by prefix
  35. 35. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user discoverability (without having to dig into the code)
  36. 36. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user what does it do?
  37. 37. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem
  38. 38. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem no least privilege…
  39. 39. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  40. 40. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  41. 41. more dependecies equals slower cold start
  42. 42. author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z) worse cold start performance
  43. 43. keep functions simple, and single-purposed
  44. 44. #2 one account that rules them all
  45. 45. mind the shared limits
  46. 46. no. of DynamoDB tables no. of API Gateway regional APIs no. of API Gateway edge-optimized APIs no. of Kinesis shards no. of IAM roles no. of S3 buckets no. of CloudFormation stacks no. of SNS subscription filters no. of SSM parameters … Resource Limits
  47. 47. DynamoDB read & write API Gateway requests/second Lambda concurrent executions SSM parameter ops/second … Throughput Limits
  48. 48. compartmentalise security breaches
  49. 49. One account per Team per Environment
  50. 50. #3 do first, research later
  51. 51. https://einaregilsson.com/serverless-15-percent-slower-and-eight-times-more-expensive/
  52. 52. the platforms need to do better at educating users on how to choose between different services
  53. 53. SNS vs SQS vs Kinesis vs MKS? the platforms need to do better at educating users on how to choose between different services
  54. 54. ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  55. 55. https://medium.com/theburningmonk-com/all-my-posts-on-serverless-aws-lambda-43c17a147f91
  56. 56. https://www.jeremydaly.com/newsletter/
  57. 57. #4 not using a deployment toolkit
  58. 58. https://lumigo.io/blog/comparison-of-lambda-deployment-frameworks/
  59. 59. don’t write your own deployment framework
  60. 60. #5 console-driven development
  61. 61. #6 one repo per function
  62. 62. github repo github repo github repo github repo github repo github repo github repo github repo github repo
  63. 63. github repo github repo github repo github repo github repo github repo github repo github repo github repo
  64. 64. monorepo?
  65. 65. github repo
  66. 66. one repo per service?
  67. 67. github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  68. 68. https://lumigo.io/blog/mono-repo-vs-one-per-service/
  69. 69. github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  70. 70. CI/CD pipeline per service
  71. 71. functions are deployed together, as a stack
  72. 72. unencrypted secrets in env vars #7
  73. 73. secrets should NEVER be in plain text in env variables
  74. 74. SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: …
  75. 75. SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: … yay!
  76. 76. SSM Parameter Store Secret 1 Secret 2 IAM fetch at cold start, cache, invalidate every x mins
  77. 77. https://github.com/middyjs/middy
  78. 78. SSM Parameter Store Secret 1 Secret 2 IAM switch to Higher Throughput if you need more than 40 ops/s
  79. 79. not following least privilege principle #8
  80. 80. missing DLQs #9
  81. 81. async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2v7Kc3b Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS
  82. 82. async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2vs2lIg Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS Lambda handles retries (twice, then DLQ)
  83. 83. configure DLQ for async functions so you don’t lose failed events
  84. 84. DLQ Lambda Destinations payload payload, context(s), and response
  85. 85. DLQ Lambda Destinations payload payload, context(s), and response DeadLetterErrors no error metrics!
  86. 86. too much/too little concurrency #10
  87. 87. “Lambda generates too much load for the downstream system”
  88. 88. one invocation per message SNS Lambda
  89. 89. Downstream System SNS Lambda
  90. 90. ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  91. 91. if you want… maximum throughput SNS precise control over throughput Kinesis
  92. 92. if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out
  93. 93. if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out SQS DynamoDB Streams
  94. 94. ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  95. 95. cold starts #11
  96. 96. “cold starts only happen to the first request”
  97. 97. function invocationconcurrent execution i.e. a container
  98. 98. function invocationconcurrent execution i.e. a container class instance method call
  99. 99. Lambda scales the number of concurrent executions based on traffic
  100. 100. existing “containers” are reused where possible
  101. 101. time invocation
  102. 102. time invocation invocation
  103. 103. time invocation invocation
  104. 104. time invocation invocation invocation invocation
  105. 105. time invocation invocation invocation invocation invocation invocation
  106. 106. time invocation invocation invocation invocation invocation invocation
  107. 107. time invocation invocation invocation invocation invocation invocation invocation
  108. 108. time invocation invocation invocation invocation invocation invocation invocation invocation
  109. 109. time invocation invocation invocation invocation invocation invocation invocation invocation
  110. 110. time invocation invocation invocation invocation invocation invocation invocation invocation
  111. 111. time invocation invocation ping invocation invocation invocation ping ping
  112. 112. Lambda warmers don’t work when you have > 1 concurrent executions
  113. 113. FREQUENCY DURATION
  114. 114. FREQUENCY DURATION dictated by user traffic, out of your control
  115. 115. cold starts is generally not an issue if you have a steady traffic pattern
  116. 116. time req/s
  117. 117. time req/s El Classico
  118. 118. time req/s lunch dinner
  119. 119. FREQUENCY DURATION optimize this!
  120. 120. minimise the duration of cold starts so they fall within acceptable latency range
  121. 121. time req/s lunch dinner Provisioned Concurrency
  122. 122. time req/s lunch dinner Provisioned Concurrency On-Demand Concurrency
  123. 123. https://lumigo.io/blog/provisioned-concurrency-the-end-of-cold-starts/
  124. 124. there are no silver bullets
  125. 125. reserved concurrency is a powerful tool IFF you have a cold start problem don’t use it by default
  126. 126. RDS connection handling #12
  127. 127. default RDS configs are bad for Lambda
  128. 128. default RDS configs are bad for Lambda idle connections are not closed too many connections per “container” max open connection is too low
  129. 129. https://www.jeremydaly.com/manage-rds-connections-aws-lambda/
  130. 130. set “wait_timeout” and “interactive_timeout” to 10 mins (default is 8 hours!)
  131. 131. increase “max_connections” setting
  132. 132. set client socket pool size to 1
  133. 133. (lack of) observability #13
  134. 134. happened system repaireduser impact reduce MTTR
  135. 135. Identify & Resolve Issues Understanding costs Visibility
  136. 136. Identify & Resolve Issues Understanding costs Visibility
  137. 137. happened system repaireduser impact MTTDiscovery
  138. 138. “What alerts should I have?”
  139. 139. It depends on what you’re building…
  140. 140. But, this is a good starting point
  141. 141. Lambda error rate % throttle count DLR error count iterator age regional concurrency
  142. 142. Lambda error rate % throttle count DLR error count iterator age regional concurrency API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate %
  143. 143. API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Lambda error rate % throttle count DLR error count iterator age regional concurrency
  144. 144. API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Step Functions failed count throttle count timed out count Lambda error rate % throttle count DLR error count iterator age regional concurrency
  145. 145. SQS message age Step Functions failed count throttle count timed out count API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % Lambda error rate % throttle count DLR error count iterator age regional concurrency
  146. 146. monitor and alert on message flow rate for event processing pipelines
  147. 147. “Can’t you codify these?”
  148. 148. https://theburningmonk.com/hire-me AdviseTraining Delivery “Fundamentally, Yan has improved our team by increasing our ability to derive value from AWS and Lambda in particular.” Nick Blair Tech Lead
  149. 149. Production-Ready Serverless
  150. 150. bit.ly/prod-ready-sls-ams-2019 Backspace Escape Rooms Transformatorweg 30A 20% off with aws-ug-ams-0109
  151. 151. https://theburningmonk.com/workshops Amsterdam, March 19-20 Helsinki, May 4-5 Stockholm, May 14-15 Dublin, June 16-17 London, September 24-25 early bird until Feb 11
  152. 152. @theburningmonk theburningmonk.com github.com/theburningmonk

×