Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Beware the potholes on the road to serverless

523 views

Published on

Looking in from the outside, serverless seems so simple! And yet, many companies are struggling on their journey to serverless. In this talk, I highlight a number of mistakes companies are making when they adopt serverless.

Recording: https://www.youtube.com/watch?v=4hzNqxo0yTA

Real-world serverless podcast: https://realworldserverless.com
Learn Lambda best practices: https://lambdabestpractice.com
Blog: https://theburningmonk.com
Consulting services: https://theburningmonk.com/hire-me
Production-Ready Serverless workshop: https://productionreadyserverless.com

Published in: Technology
  • Be the first to comment

Beware the potholes on the road to serverless

  1. 1. MIND THE POTHOLES MIND THE POTHOLES
  2. 2. @theburningmonk theburningmonk.com back in the day…
  3. 3. we’re getting a new server in 3 months time
  4. 4. yay! about time! hooray!! finally!
  5. 5. but we have to decide what dependencies to install on it now..
  6. 6.
  7. 7. @theburningmonk theburningmonk.com on premise VMs in the cloud
  8. 8. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction level
  9. 9. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction levelproductivity
  10. 10. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction levelproductivity
  11. 11. @theburningmonk theburningmonk.com less is more
  12. 12. @theburningmonk theburningmonk.com
  13. 13. @theburningmonk theburningmonk.com
  14. 14. @theburningmonk theburningmonk.com right?
  15. 15. @theburningmonk theburningmonk.com
  16. 16. @theburningmonk theburningmonk.com “why are we failing at this?”
  17. 17. hidden dangers
  18. 18. @theburningmonk theburningmonk.com monolith microservices serverless
  19. 19. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context
  20. 20. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context
  21. 21. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context event driven
  22. 22. @theburningmonk theburningmonk.com monolith serverless missing learnings from microservices
  23. 23. @theburningmonk theburningmonk.com monolith serverless missing learnings from microservices poor decisions
  24. 24. Yan Cui http://theburningmonk.com @theburningmonk AWS user for 10 years
  25. 25. http://bit.ly/yubl-serverless
  26. 26. Yan Cui http://theburningmonk.com @theburningmonk Developer Advocate @
  27. 27. Yan Cui http://theburningmonk.com @theburningmonk Independent Consultant advisetraining delivery
  28. 28. theburningmonk.com/courses
  29. 29. theburningmonk.com/workshops in your company flexible datesHelsinki, Aug 20-21 London, Sep 24-25 Berlin, Oct 8-9 4-week virtual workshop, May 4 - May 29 Amsterdam, Jul 7-8
  30. 30. @theburningmonk theburningmonk.com not letting go of legacy thinking one account that rules them all do first, research later not using a deployment framework console-driven development one repo per function unencrypted secrets in env vars not following least privilege principle missing DLQs too much/too little concurrency cold starts RDS connection handling (lack of) observability #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13
  31. 31. #1 not letting go of legacy thinking
  32. 32. “we’re doing serverless, but why aren’t thing going faster?”
  33. 33. @theburningmonk theburningmonk.com Socio Technical
  34. 34. @theburningmonk theburningmonk.com there are no silver bullets
  35. 35. @theburningmonk theburningmonk.com
  36. 36. @theburningmonk theburningmonk.com centralised team Team A Team B Team C Team D …
  37. 37. @theburningmonk theburningmonk.com “but the developers don’t understand AWS and how our infrastructure is set up”
  38. 38. @theburningmonk theburningmonk.com “but the developers don’t understand AWS and how our infrastructure is set up” let’s solve this problem instead!
  39. 39. @theburningmonk theburningmonk.com what got you here won’t get you there
  40. 40. @theburningmonk theburningmonk.com if (path == “/user” && method == “GET”) { return getUser(…); } else if (path == “/user” && method == “DELETE”) { return deleteUser(…); } else if (path == “/user” && method == “POST”) { return createUser(…); } else if …. Monolithic Functions
  41. 41. @theburningmonk theburningmonk.com GET /user POST /user DELETE /user Single-Purposed Functions
  42. 42. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user
  43. 43. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user find related functions by prefix
  44. 44. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user discoverability (without having to dig into the code)
  45. 45. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user what does it do?
  46. 46. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem
  47. 47. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem no least privilege…
  48. 48. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  49. 49. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  50. 50. @theburningmonk theburningmonk.com
  51. 51. @theburningmonk theburningmonk.com more dependecies equals slower cold start
  52. 52. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z) worse cold start performance
  53. 53. @theburningmonk theburningmonk.com keep functions simple, and single-purposed
  54. 54. #2 one account that rules them all
  55. 55. @theburningmonk theburningmonk.com mind the shared limits
  56. 56. @theburningmonk theburningmonk.com no. of DynamoDB tables no. of API Gateway regional APIs no. of API Gateway edge-optimized APIs no. of Kinesis shards no. of IAM roles no. of S3 buckets no. of CloudFormation stacks no. of SNS subscription filters no. of SSM parameters … Resource Limits
  57. 57. @theburningmonk theburningmonk.com DynamoDB read & write API Gateway requests/second Lambda concurrent executions SSM parameter ops/second … Throughput Limits
  58. 58. @theburningmonk theburningmonk.com
  59. 59. @theburningmonk theburningmonk.com compartmentalise security breaches
  60. 60. @theburningmonk theburningmonk.com One account per Team per Environment
  61. 61. @theburningmonk theburningmonk.com
  62. 62. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation
  63. 63. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation
  64. 64. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation Accounts Org Units SCPs Pwd Policies Multi-Region Pseudo-Funs Init & Validate CI/CD
  65. 65. #3 do first, research later
  66. 66. @theburningmonk theburningmonk.com https://einaregilsson.com/serverless-15-percent-slower-and-eight-times-more-expensive/
  67. 67. @theburningmonk theburningmonk.com
  68. 68. @theburningmonk theburningmonk.com
  69. 69. @theburningmonk theburningmonk.com
  70. 70. @theburningmonk theburningmonk.com the platforms need to do better at educating users on how to choose between different services
  71. 71. @theburningmonk theburningmonk.com SNS vs SQS vs Kinesis vs MKS? the platforms need to do better at educating users on how to choose between different services
  72. 72. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  73. 73. @theburningmonk theburningmonk.com https://medium.com/theburningmonk-com/all-my-posts-on-serverless-aws-lambda-43c17a147f91
  74. 74. @theburningmonk theburningmonk.com https://www.jeremydaly.com/newsletter/
  75. 75. #4 not using a deployment toolkit
  76. 76. @theburningmonk theburningmonk.com
  77. 77. @theburningmonk theburningmonk.com https://lumigo.io/blog/comparison-of-lambda-deployment-frameworks/
  78. 78. @theburningmonk theburningmonk.com don’t write your own deployment framework
  79. 79. #5 console-driven development
  80. 80. @theburningmonk theburningmonk.com
  81. 81. @theburningmonk theburningmonk.com
  82. 82. #6 one repo per function
  83. 83. @theburningmonk theburningmonk.com github repo github repo github repo github repo github repo github repo github repo github repo github repo
  84. 84. @theburningmonk theburningmonk.com github repo github repo github repo github repo github repo github repo github repo github repo github repo
  85. 85. @theburningmonk theburningmonk.com monorepo?
  86. 86. @theburningmonk theburningmonk.com github repo
  87. 87. @theburningmonk theburningmonk.com one repo per service?
  88. 88. @theburningmonk theburningmonk.com github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  89. 89. @theburningmonk theburningmonk.com https://lumigo.io/blog/mono-repo-vs-one-per-service
  90. 90. @theburningmonk theburningmonk.com
  91. 91. @theburningmonk theburningmonk.com
  92. 92. @theburningmonk theburningmonk.com
  93. 93. @theburningmonk theburningmonk.com
  94. 94. @theburningmonk theburningmonk.com
  95. 95. @theburningmonk theburningmonk.com
  96. 96. @theburningmonk theburningmonk.com
  97. 97. @theburningmonk theburningmonk.com github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  98. 98. @theburningmonk theburningmonk.com CI/CD pipeline per service
  99. 99. @theburningmonk theburningmonk.com functions are deployed together, as a stack
  100. 100. unencrypted secrets in env vars #7
  101. 101. @theburningmonk theburningmonk.com secrets should NEVER be in plain text in env variables
  102. 102. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: …
  103. 103. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: … yay!
  104. 104. @theburningmonk theburningmonk.com
  105. 105. @theburningmonk theburningmonk.com
  106. 106. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM fetch at cold start, cache, invalidate every x mins
  107. 107. @theburningmonk theburningmonk.com https://github.com/middyjs/middy
  108. 108. @theburningmonk theburningmonk.com
  109. 109. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM switch to Higher Throughput if you need more than 40 ops/s
  110. 110. not following least privilege principle #8
  111. 111. @theburningmonk theburningmonk.com
  112. 112. @theburningmonk theburningmonk.com
  113. 113. missing DLQs #9
  114. 114. @theburningmonk theburningmonk.com async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2v7Kc3b Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS
  115. 115. @theburningmonk theburningmonk.com async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2vs2lIg Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS Lambda handles retries (twice, then DLQ)
  116. 116. @theburningmonk theburningmonk.com configure DLQ for async functions so you don’t lose failed events
  117. 117. @theburningmonk theburningmonk.com
  118. 118. @theburningmonk theburningmonk.com
  119. 119. @theburningmonk theburningmonk.com
  120. 120. @theburningmonk theburningmonk.com
  121. 121. too much/too little concurrency #10
  122. 122. @theburningmonk theburningmonk.com “Lambda generates too much load for the downstream system”
  123. 123. @theburningmonk theburningmonk.com one invocation per message SNS Lambda
  124. 124. @theburningmonk theburningmonk.com Downstream System SNS Lambda
  125. 125. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  126. 126. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis
  127. 127. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out
  128. 128. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out SQS DynamoDB Streams
  129. 129. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  130. 130. cold starts #11
  131. 131. @theburningmonk theburningmonk.com “cold starts only happen to the first request”
  132. 132. @theburningmonk theburningmonk.com function invocationconcurrent execution i.e. a container
  133. 133. @theburningmonk theburningmonk.com function invocationconcurrent execution i.e. a container class instance method call
  134. 134. @theburningmonk theburningmonk.com Lambda scales the number of concurrent executions based on traffic
  135. 135. @theburningmonk theburningmonk.com existing “containers” are reused where possible
  136. 136. @theburningmonk theburningmonk.com time invocation
  137. 137. @theburningmonk theburningmonk.com time invocation invocation
  138. 138. @theburningmonk theburningmonk.com time invocation invocation
  139. 139. @theburningmonk theburningmonk.com time invocation invocation invocation invocation
  140. 140. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation
  141. 141. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation
  142. 142. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation
  143. 143. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  144. 144. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  145. 145. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  146. 146. @theburningmonk theburningmonk.com
  147. 147. @theburningmonk theburningmonk.com
  148. 148. @theburningmonk theburningmonk.com
  149. 149. @theburningmonk theburningmonk.com
  150. 150. @theburningmonk theburningmonk.com time invocation invocation ping invocation invocation invocation ping ping
  151. 151. @theburningmonk theburningmonk.com Lambda warmers don’t work when you have > 1 concurrent executions
  152. 152. @theburningmonk theburningmonk.com FREQUENCY DURATION
  153. 153. @theburningmonk theburningmonk.com FREQUENCY DURATION dictated by user traffic, out of your control
  154. 154. @theburningmonk theburningmonk.com cold starts is generally not an issue if you have a steady traffic pattern
  155. 155. @theburningmonk theburningmonk.com time req/s
  156. 156. @theburningmonk theburningmonk.com time req/s El Classico
  157. 157. @theburningmonk theburningmonk.com time req/s lunch dinner
  158. 158. @theburningmonk theburningmonk.com FREQUENCY DURATION optimize this!
  159. 159. @theburningmonk theburningmonk.com minimise the duration of cold starts so they fall within acceptable latency range
  160. 160. @theburningmonk theburningmonk.com time req/s lunch dinner Provisioned Concurrency
  161. 161. @theburningmonk theburningmonk.com time req/s lunch dinner Provisioned Concurrency On-Demand Concurrency
  162. 162. @theburningmonk theburningmonk.com https://lumigo.io/blog/provisioned-concurrency-the-end-of-cold-starts/
  163. 163. @theburningmonk theburningmonk.com there are no silver bullets
  164. 164. @theburningmonk theburningmonk.com Provisioned concurrency is a powerful tool IFF you have a cold start problem don’t use it by default
  165. 165. RDS connection handling #12
  166. 166. @theburningmonk theburningmonk.com default RDS configs are bad for Lambda
  167. 167. @theburningmonk theburningmonk.com default RDS configs are bad for Lambda idle connections are not closed too many connections per “container” max open connection is too low
  168. 168. @theburningmonk theburningmonk.com https://www.jeremydaly.com/manage-rds-connections-aws-lambda/
  169. 169. @theburningmonk theburningmonk.com set “wait_timeout” and “interactive_timeout” to 10 mins (default is 8 hours!)
  170. 170. @theburningmonk theburningmonk.com increase “max_connections” setting
  171. 171. @theburningmonk theburningmonk.com set client socket pool size to 1
  172. 172. @theburningmonk theburningmonk.com
  173. 173. @theburningmonk theburningmonk.com
  174. 174. (lack of) observability #13
  175. 175. @theburningmonk theburningmonk.com happened system repaireduser impact reduce MTTR
  176. 176. @theburningmonk theburningmonk.com Identify & Resolve Issues Understanding costs Visibility
  177. 177. @theburningmonk theburningmonk.com Identify & Resolve Issues Understanding costs Visibility
  178. 178. @theburningmonk theburningmonk.com happened system repaireduser impact MTTDiscovery
  179. 179. @theburningmonk theburningmonk.com
  180. 180. @theburningmonk theburningmonk.com “What alerts should I have?”
  181. 181. @theburningmonk theburningmonk.com It depends on what you’re building…
  182. 182. @theburningmonk theburningmonk.com But, this is a good starting point
  183. 183. @theburningmonk theburningmonk.com Lambda error rate % throttle count DLR error count iterator age regional concurrency
  184. 184. @theburningmonk theburningmonk.com Lambda error rate % throttle count DLR error count iterator age regional concurrency API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate %
  185. 185. @theburningmonk theburningmonk.com API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Lambda error rate % throttle count DLR error count iterator age regional concurrency
  186. 186. @theburningmonk theburningmonk.com API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Step Functions failed count throttle count timed out count Lambda error rate % throttle count DLR error count iterator age regional concurrency
  187. 187. @theburningmonk theburningmonk.com SQS message age Step Functions failed count throttle count timed out count API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % Lambda error rate % throttle count DLR error count iterator age regional concurrency
  188. 188. @theburningmonk theburningmonk.com monitor and alert on message flow rate for event processing pipelines
  189. 189. @theburningmonk theburningmonk.com “Can’t you codify these?”
  190. 190. @theburningmonk theburningmonk.com
  191. 191. @theburningmonk theburningmonk.com not letting go of legacy thinking one account that rules them all do first, research later not using a deployment framework console-driven development one repo per function unencrypted secrets in env vars not following least privilege principle missing DLQs too much/too little concurrency cold starts RDS connection handling (lack of) observability #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13
  192. 192. https://theburningmonk.com/hire-me AdviseTraining Delivery “Fundamentally, Yan has improved our team by increasing our ability to derive value from AWS and Lambda in particular.” Nick Blair Tech Lead
  193. 193. @theburningmonk theburningmonk.com Production-Ready Serverless
  194. 194. in your company flexible datesHelsinki, Aug 20-21 London, Sep 24-25 Berlin, Oct 8-9Amsterdam, Jul 7-8 4-week virtual workshop, May 4 - May 29 @theburningmonk theburningmonk.com theburningmonk.com/workshops slsdays-virtual-202004 €100 off all my workshops
  195. 195. @theburningmonk theburningmonk.com lambdabestpractice.com bit.ly/complete-guide-to-aws-step-functions 20% off my courses slsdays-virtual-202004
  196. 196. @theburningmonk theburningmonk.com github.com/theburningmonk

×