Successfully reported this slideshow.

Beware the potholes on the road to serverless

4

Share

Loading in …3
×
1 of 200
1 of 200

Beware the potholes on the road to serverless

4

Share

Download to read offline

Description

Looking in from the outside, serverless seems so simple! And yet, many companies are struggling on their journey to serverless. In this talk, I highlight a number of mistakes companies are making when they adopt serverless.

Recording: https://www.youtube.com/watch?v=4hzNqxo0yTA

Real-world serverless podcast: https://realworldserverless.com
Learn Lambda best practices: https://lambdabestpractice.com
Blog: https://theburningmonk.com
Consulting services: https://theburningmonk.com/hire-me
Production-Ready Serverless workshop: https://productionreadyserverless.com

Transcript

  1. 1. MIND THE POTHOLES MIND THE POTHOLES
  2. 2. @theburningmonk theburningmonk.com back in the day…
  3. 3. we’re getting a new server in 3 months time
  4. 4. yay! about time! hooray!! finally!
  5. 5. but we have to decide what dependencies to install on it now..
  6. 6.
  7. 7. @theburningmonk theburningmonk.com on premise VMs in the cloud
  8. 8. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction level
  9. 9. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction levelproductivity
  10. 10. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction levelproductivity
  11. 11. @theburningmonk theburningmonk.com less is more
  12. 12. @theburningmonk theburningmonk.com
  13. 13. @theburningmonk theburningmonk.com
  14. 14. @theburningmonk theburningmonk.com right?
  15. 15. @theburningmonk theburningmonk.com
  16. 16. @theburningmonk theburningmonk.com “why are we failing at this?”
  17. 17. hidden dangers
  18. 18. @theburningmonk theburningmonk.com monolith microservices serverless
  19. 19. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context
  20. 20. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context
  21. 21. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context event driven
  22. 22. @theburningmonk theburningmonk.com monolith serverless missing learnings from microservices
  23. 23. @theburningmonk theburningmonk.com monolith serverless missing learnings from microservices poor decisions
  24. 24. Yan Cui http://theburningmonk.com @theburningmonk AWS user for 10 years
  25. 25. http://bit.ly/yubl-serverless
  26. 26. Yan Cui http://theburningmonk.com @theburningmonk Developer Advocate @
  27. 27. Yan Cui http://theburningmonk.com @theburningmonk Independent Consultant advisetraining delivery
  28. 28. theburningmonk.com/courses
  29. 29. theburningmonk.com/workshops in your company flexible datesHelsinki, Aug 20-21 London, Sep 24-25 Berlin, Oct 8-9 4-week virtual workshop, May 4 - May 29 Amsterdam, Jul 7-8
  30. 30. @theburningmonk theburningmonk.com not letting go of legacy thinking one account that rules them all do first, research later not using a deployment framework console-driven development one repo per function unencrypted secrets in env vars not following least privilege principle missing DLQs too much/too little concurrency cold starts RDS connection handling (lack of) observability #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13
  31. 31. #1 not letting go of legacy thinking
  32. 32. “we’re doing serverless, but why aren’t thing going faster?”
  33. 33. @theburningmonk theburningmonk.com Socio Technical
  34. 34. @theburningmonk theburningmonk.com there are no silver bullets
  35. 35. @theburningmonk theburningmonk.com
  36. 36. @theburningmonk theburningmonk.com centralised team Team A Team B Team C Team D …
  37. 37. @theburningmonk theburningmonk.com “but the developers don’t understand AWS and how our infrastructure is set up”
  38. 38. @theburningmonk theburningmonk.com “but the developers don’t understand AWS and how our infrastructure is set up” let’s solve this problem instead!
  39. 39. @theburningmonk theburningmonk.com what got you here won’t get you there
  40. 40. @theburningmonk theburningmonk.com if (path == “/user” && method == “GET”) { return getUser(…); } else if (path == “/user” && method == “DELETE”) { return deleteUser(…); } else if (path == “/user” && method == “POST”) { return createUser(…); } else if …. Monolithic Functions
  41. 41. @theburningmonk theburningmonk.com GET /user POST /user DELETE /user Single-Purposed Functions
  42. 42. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user
  43. 43. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user find related functions by prefix
  44. 44. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user discoverability (without having to dig into the code)
  45. 45. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user what does it do?
  46. 46. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem
  47. 47. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem no least privilege…
  48. 48. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  49. 49. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  50. 50. @theburningmonk theburningmonk.com
  51. 51. @theburningmonk theburningmonk.com more dependecies equals slower cold start
  52. 52. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z) worse cold start performance
  53. 53. @theburningmonk theburningmonk.com keep functions simple, and single-purposed
  54. 54. #2 one account that rules them all
  55. 55. @theburningmonk theburningmonk.com mind the shared limits
  56. 56. @theburningmonk theburningmonk.com no. of DynamoDB tables no. of API Gateway regional APIs no. of API Gateway edge-optimized APIs no. of Kinesis shards no. of IAM roles no. of S3 buckets no. of CloudFormation stacks no. of SNS subscription filters no. of SSM parameters … Resource Limits
  57. 57. @theburningmonk theburningmonk.com DynamoDB read & write API Gateway requests/second Lambda concurrent executions SSM parameter ops/second … Throughput Limits
  58. 58. @theburningmonk theburningmonk.com
  59. 59. @theburningmonk theburningmonk.com compartmentalise security breaches
  60. 60. @theburningmonk theburningmonk.com One account per Team per Environment
  61. 61. @theburningmonk theburningmonk.com
  62. 62. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation
  63. 63. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation
  64. 64. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation Accounts Org Units SCPs Pwd Policies Multi-Region Pseudo-Funs Init & Validate CI/CD
  65. 65. #3 do first, research later
  66. 66. @theburningmonk theburningmonk.com https://einaregilsson.com/serverless-15-percent-slower-and-eight-times-more-expensive/
  67. 67. @theburningmonk theburningmonk.com
  68. 68. @theburningmonk theburningmonk.com
  69. 69. @theburningmonk theburningmonk.com
  70. 70. @theburningmonk theburningmonk.com the platforms need to do better at educating users on how to choose between different services
  71. 71. @theburningmonk theburningmonk.com SNS vs SQS vs Kinesis vs MKS? the platforms need to do better at educating users on how to choose between different services
  72. 72. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  73. 73. @theburningmonk theburningmonk.com https://medium.com/theburningmonk-com/all-my-posts-on-serverless-aws-lambda-43c17a147f91
  74. 74. @theburningmonk theburningmonk.com https://www.jeremydaly.com/newsletter/
  75. 75. #4 not using a deployment toolkit
  76. 76. @theburningmonk theburningmonk.com
  77. 77. @theburningmonk theburningmonk.com https://lumigo.io/blog/comparison-of-lambda-deployment-frameworks/
  78. 78. @theburningmonk theburningmonk.com don’t write your own deployment framework
  79. 79. #5 console-driven development
  80. 80. @theburningmonk theburningmonk.com
  81. 81. @theburningmonk theburningmonk.com
  82. 82. #6 one repo per function
  83. 83. @theburningmonk theburningmonk.com github repo github repo github repo github repo github repo github repo github repo github repo github repo
  84. 84. @theburningmonk theburningmonk.com github repo github repo github repo github repo github repo github repo github repo github repo github repo
  85. 85. @theburningmonk theburningmonk.com monorepo?
  86. 86. @theburningmonk theburningmonk.com github repo
  87. 87. @theburningmonk theburningmonk.com one repo per service?
  88. 88. @theburningmonk theburningmonk.com github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  89. 89. @theburningmonk theburningmonk.com https://lumigo.io/blog/mono-repo-vs-one-per-service
  90. 90. @theburningmonk theburningmonk.com
  91. 91. @theburningmonk theburningmonk.com
  92. 92. @theburningmonk theburningmonk.com
  93. 93. @theburningmonk theburningmonk.com
  94. 94. @theburningmonk theburningmonk.com
  95. 95. @theburningmonk theburningmonk.com
  96. 96. @theburningmonk theburningmonk.com
  97. 97. @theburningmonk theburningmonk.com github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  98. 98. @theburningmonk theburningmonk.com CI/CD pipeline per service
  99. 99. @theburningmonk theburningmonk.com functions are deployed together, as a stack
  100. 100. unencrypted secrets in env vars #7
  101. 101. @theburningmonk theburningmonk.com secrets should NEVER be in plain text in env variables
  102. 102. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: …
  103. 103. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: … yay!
  104. 104. @theburningmonk theburningmonk.com
  105. 105. @theburningmonk theburningmonk.com
  106. 106. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM fetch at cold start, cache, invalidate every x mins
  107. 107. @theburningmonk theburningmonk.com https://github.com/middyjs/middy
  108. 108. @theburningmonk theburningmonk.com
  109. 109. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM switch to Higher Throughput if you need more than 40 ops/s
  110. 110. not following least privilege principle #8
  111. 111. @theburningmonk theburningmonk.com
  112. 112. @theburningmonk theburningmonk.com
  113. 113. missing DLQs #9
  114. 114. @theburningmonk theburningmonk.com async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2v7Kc3b Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS
  115. 115. @theburningmonk theburningmonk.com async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2vs2lIg Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS Lambda handles retries (twice, then DLQ)
  116. 116. @theburningmonk theburningmonk.com configure DLQ for async functions so you don’t lose failed events
  117. 117. @theburningmonk theburningmonk.com
  118. 118. @theburningmonk theburningmonk.com
  119. 119. @theburningmonk theburningmonk.com
  120. 120. @theburningmonk theburningmonk.com
  121. 121. too much/too little concurrency #10
  122. 122. @theburningmonk theburningmonk.com “Lambda generates too much load for the downstream system”
  123. 123. @theburningmonk theburningmonk.com one invocation per message SNS Lambda
  124. 124. @theburningmonk theburningmonk.com Downstream System SNS Lambda
  125. 125. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  126. 126. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis
  127. 127. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out
  128. 128. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out SQS DynamoDB Streams
  129. 129. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  130. 130. cold starts #11
  131. 131. @theburningmonk theburningmonk.com “cold starts only happen to the first request”
  132. 132. @theburningmonk theburningmonk.com function invocationconcurrent execution i.e. a container
  133. 133. @theburningmonk theburningmonk.com function invocationconcurrent execution i.e. a container class instance method call
  134. 134. @theburningmonk theburningmonk.com Lambda scales the number of concurrent executions based on traffic
  135. 135. @theburningmonk theburningmonk.com existing “containers” are reused where possible
  136. 136. @theburningmonk theburningmonk.com time invocation
  137. 137. @theburningmonk theburningmonk.com time invocation invocation
  138. 138. @theburningmonk theburningmonk.com time invocation invocation
  139. 139. @theburningmonk theburningmonk.com time invocation invocation invocation invocation
  140. 140. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation
  141. 141. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation
  142. 142. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation
  143. 143. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  144. 144. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  145. 145. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  146. 146. @theburningmonk theburningmonk.com
  147. 147. @theburningmonk theburningmonk.com
  148. 148. @theburningmonk theburningmonk.com
  149. 149. @theburningmonk theburningmonk.com
  150. 150. @theburningmonk theburningmonk.com time invocation invocation ping invocation invocation invocation ping ping
  151. 151. @theburningmonk theburningmonk.com Lambda warmers don’t work when you have > 1 concurrent executions
  152. 152. @theburningmonk theburningmonk.com FREQUENCY DURATION
  153. 153. @theburningmonk theburningmonk.com FREQUENCY DURATION dictated by user traffic, out of your control
  154. 154. @theburningmonk theburningmonk.com cold starts is generally not an issue if you have a steady traffic pattern
  155. 155. @theburningmonk theburningmonk.com time req/s
  156. 156. @theburningmonk theburningmonk.com time req/s El Classico
  157. 157. @theburningmonk theburningmonk.com time req/s lunch dinner
  158. 158. @theburningmonk theburningmonk.com FREQUENCY DURATION optimize this!
  159. 159. @theburningmonk theburningmonk.com minimise the duration of cold starts so they fall within acceptable latency range
  160. 160. @theburningmonk theburningmonk.com time req/s lunch dinner Provisioned Concurrency
  161. 161. @theburningmonk theburningmonk.com time req/s lunch dinner Provisioned Concurrency On-Demand Concurrency
  162. 162. @theburningmonk theburningmonk.com https://lumigo.io/blog/provisioned-concurrency-the-end-of-cold-starts/
  163. 163. @theburningmonk theburningmonk.com there are no silver bullets
  164. 164. @theburningmonk theburningmonk.com Provisioned concurrency is a powerful tool IFF you have a cold start problem don’t use it by default
  165. 165. RDS connection handling #12
  166. 166. @theburningmonk theburningmonk.com default RDS configs are bad for Lambda
  167. 167. @theburningmonk theburningmonk.com default RDS configs are bad for Lambda idle connections are not closed too many connections per “container” max open connection is too low
  168. 168. @theburningmonk theburningmonk.com https://www.jeremydaly.com/manage-rds-connections-aws-lambda/
  169. 169. @theburningmonk theburningmonk.com set “wait_timeout” and “interactive_timeout” to 10 mins (default is 8 hours!)
  170. 170. @theburningmonk theburningmonk.com increase “max_connections” setting
  171. 171. @theburningmonk theburningmonk.com set client socket pool size to 1
  172. 172. @theburningmonk theburningmonk.com
  173. 173. @theburningmonk theburningmonk.com
  174. 174. (lack of) observability #13
  175. 175. @theburningmonk theburningmonk.com happened system repaireduser impact reduce MTTR
  176. 176. @theburningmonk theburningmonk.com Identify & Resolve Issues Understanding costs Visibility
  177. 177. @theburningmonk theburningmonk.com Identify & Resolve Issues Understanding costs Visibility
  178. 178. @theburningmonk theburningmonk.com happened system repaireduser impact MTTDiscovery
  179. 179. @theburningmonk theburningmonk.com
  180. 180. @theburningmonk theburningmonk.com “What alerts should I have?”
  181. 181. @theburningmonk theburningmonk.com It depends on what you’re building…
  182. 182. @theburningmonk theburningmonk.com But, this is a good starting point
  183. 183. @theburningmonk theburningmonk.com Lambda error rate % throttle count DLR error count iterator age regional concurrency
  184. 184. @theburningmonk theburningmonk.com Lambda error rate % throttle count DLR error count iterator age regional concurrency API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate %
  185. 185. @theburningmonk theburningmonk.com API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Lambda error rate % throttle count DLR error count iterator age regional concurrency
  186. 186. @theburningmonk theburningmonk.com API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Step Functions failed count throttle count timed out count Lambda error rate % throttle count DLR error count iterator age regional concurrency
  187. 187. @theburningmonk theburningmonk.com SQS message age Step Functions failed count throttle count timed out count API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % Lambda error rate % throttle count DLR error count iterator age regional concurrency
  188. 188. @theburningmonk theburningmonk.com monitor and alert on message flow rate for event processing pipelines
  189. 189. @theburningmonk theburningmonk.com “Can’t you codify these?”
  190. 190. @theburningmonk theburningmonk.com
  191. 191. @theburningmonk theburningmonk.com not letting go of legacy thinking one account that rules them all do first, research later not using a deployment framework console-driven development one repo per function unencrypted secrets in env vars not following least privilege principle missing DLQs too much/too little concurrency cold starts RDS connection handling (lack of) observability #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13
  192. 192. https://theburningmonk.com/hire-me AdviseTraining Delivery “Fundamentally, Yan has improved our team by increasing our ability to derive value from AWS and Lambda in particular.” Nick Blair Tech Lead
  193. 193. @theburningmonk theburningmonk.com Production-Ready Serverless
  194. 194. in your company flexible datesHelsinki, Aug 20-21 London, Sep 24-25 Berlin, Oct 8-9Amsterdam, Jul 7-8 4-week virtual workshop, May 4 - May 29 @theburningmonk theburningmonk.com theburningmonk.com/workshops slsdays-virtual-202004 €100 off all my workshops
  195. 195. @theburningmonk theburningmonk.com lambdabestpractice.com bit.ly/complete-guide-to-aws-step-functions 20% off my courses slsdays-virtual-202004
  196. 196. @theburningmonk theburningmonk.com github.com/theburningmonk

Description

Looking in from the outside, serverless seems so simple! And yet, many companies are struggling on their journey to serverless. In this talk, I highlight a number of mistakes companies are making when they adopt serverless.

Recording: https://www.youtube.com/watch?v=4hzNqxo0yTA

Real-world serverless podcast: https://realworldserverless.com
Learn Lambda best practices: https://lambdabestpractice.com
Blog: https://theburningmonk.com
Consulting services: https://theburningmonk.com/hire-me
Production-Ready Serverless workshop: https://productionreadyserverless.com

Transcript

  1. 1. MIND THE POTHOLES MIND THE POTHOLES
  2. 2. @theburningmonk theburningmonk.com back in the day…
  3. 3. we’re getting a new server in 3 months time
  4. 4. yay! about time! hooray!! finally!
  5. 5. but we have to decide what dependencies to install on it now..
  6. 6.
  7. 7. @theburningmonk theburningmonk.com on premise VMs in the cloud
  8. 8. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction level
  9. 9. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction levelproductivity
  10. 10. @theburningmonk theburningmonk.com on premise VMs in the cloud abstraction levelproductivity
  11. 11. @theburningmonk theburningmonk.com less is more
  12. 12. @theburningmonk theburningmonk.com
  13. 13. @theburningmonk theburningmonk.com
  14. 14. @theburningmonk theburningmonk.com right?
  15. 15. @theburningmonk theburningmonk.com
  16. 16. @theburningmonk theburningmonk.com “why are we failing at this?”
  17. 17. hidden dangers
  18. 18. @theburningmonk theburningmonk.com monolith microservices serverless
  19. 19. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context
  20. 20. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context
  21. 21. @theburningmonk theburningmonk.com monolith microservices serverless observability distributed systems bounded context event driven
  22. 22. @theburningmonk theburningmonk.com monolith serverless missing learnings from microservices
  23. 23. @theburningmonk theburningmonk.com monolith serverless missing learnings from microservices poor decisions
  24. 24. Yan Cui http://theburningmonk.com @theburningmonk AWS user for 10 years
  25. 25. http://bit.ly/yubl-serverless
  26. 26. Yan Cui http://theburningmonk.com @theburningmonk Developer Advocate @
  27. 27. Yan Cui http://theburningmonk.com @theburningmonk Independent Consultant advisetraining delivery
  28. 28. theburningmonk.com/courses
  29. 29. theburningmonk.com/workshops in your company flexible datesHelsinki, Aug 20-21 London, Sep 24-25 Berlin, Oct 8-9 4-week virtual workshop, May 4 - May 29 Amsterdam, Jul 7-8
  30. 30. @theburningmonk theburningmonk.com not letting go of legacy thinking one account that rules them all do first, research later not using a deployment framework console-driven development one repo per function unencrypted secrets in env vars not following least privilege principle missing DLQs too much/too little concurrency cold starts RDS connection handling (lack of) observability #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13
  31. 31. #1 not letting go of legacy thinking
  32. 32. “we’re doing serverless, but why aren’t thing going faster?”
  33. 33. @theburningmonk theburningmonk.com Socio Technical
  34. 34. @theburningmonk theburningmonk.com there are no silver bullets
  35. 35. @theburningmonk theburningmonk.com
  36. 36. @theburningmonk theburningmonk.com centralised team Team A Team B Team C Team D …
  37. 37. @theburningmonk theburningmonk.com “but the developers don’t understand AWS and how our infrastructure is set up”
  38. 38. @theburningmonk theburningmonk.com “but the developers don’t understand AWS and how our infrastructure is set up” let’s solve this problem instead!
  39. 39. @theburningmonk theburningmonk.com what got you here won’t get you there
  40. 40. @theburningmonk theburningmonk.com if (path == “/user” && method == “GET”) { return getUser(…); } else if (path == “/user” && method == “DELETE”) { return deleteUser(…); } else if (path == “/user” && method == “POST”) { return createUser(…); } else if …. Monolithic Functions
  41. 41. @theburningmonk theburningmonk.com GET /user POST /user DELETE /user Single-Purposed Functions
  42. 42. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user
  43. 43. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user find related functions by prefix
  44. 44. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user discoverability (without having to dig into the code)
  45. 45. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user what does it do?
  46. 46. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem
  47. 47. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user dynamodb:GetItem dynamodb:PutItem dynamodb:DeleteItem no least privilege…
  48. 48. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  49. 49. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z)
  50. 50. @theburningmonk theburningmonk.com
  51. 51. @theburningmonk theburningmonk.com more dependecies equals slower cold start
  52. 52. @theburningmonk theburningmonk.com author: yan.cui feature: user-api user-api-dev Monolithic Single-Purposed author: yan.cui feature: user-api user-api-dev-get-user author: yan.cui feature: user-api user-api-dev-create-user author: yan.cui feature: user-api user-api-dev-delete-user require(x) require(y) require(z) worse cold start performance
  53. 53. @theburningmonk theburningmonk.com keep functions simple, and single-purposed
  54. 54. #2 one account that rules them all
  55. 55. @theburningmonk theburningmonk.com mind the shared limits
  56. 56. @theburningmonk theburningmonk.com no. of DynamoDB tables no. of API Gateway regional APIs no. of API Gateway edge-optimized APIs no. of Kinesis shards no. of IAM roles no. of S3 buckets no. of CloudFormation stacks no. of SNS subscription filters no. of SSM parameters … Resource Limits
  57. 57. @theburningmonk theburningmonk.com DynamoDB read & write API Gateway requests/second Lambda concurrent executions SSM parameter ops/second … Throughput Limits
  58. 58. @theburningmonk theburningmonk.com
  59. 59. @theburningmonk theburningmonk.com compartmentalise security breaches
  60. 60. @theburningmonk theburningmonk.com One account per Team per Environment
  61. 61. @theburningmonk theburningmonk.com
  62. 62. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation
  63. 63. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation
  64. 64. @theburningmonk theburningmonk.com https://github.com/OlafConijn/AwsOrganizationFormation Accounts Org Units SCPs Pwd Policies Multi-Region Pseudo-Funs Init & Validate CI/CD
  65. 65. #3 do first, research later
  66. 66. @theburningmonk theburningmonk.com https://einaregilsson.com/serverless-15-percent-slower-and-eight-times-more-expensive/
  67. 67. @theburningmonk theburningmonk.com
  68. 68. @theburningmonk theburningmonk.com
  69. 69. @theburningmonk theburningmonk.com
  70. 70. @theburningmonk theburningmonk.com the platforms need to do better at educating users on how to choose between different services
  71. 71. @theburningmonk theburningmonk.com SNS vs SQS vs Kinesis vs MKS? the platforms need to do better at educating users on how to choose between different services
  72. 72. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  73. 73. @theburningmonk theburningmonk.com https://medium.com/theburningmonk-com/all-my-posts-on-serverless-aws-lambda-43c17a147f91
  74. 74. @theburningmonk theburningmonk.com https://www.jeremydaly.com/newsletter/
  75. 75. #4 not using a deployment toolkit
  76. 76. @theburningmonk theburningmonk.com
  77. 77. @theburningmonk theburningmonk.com https://lumigo.io/blog/comparison-of-lambda-deployment-frameworks/
  78. 78. @theburningmonk theburningmonk.com don’t write your own deployment framework
  79. 79. #5 console-driven development
  80. 80. @theburningmonk theburningmonk.com
  81. 81. @theburningmonk theburningmonk.com
  82. 82. #6 one repo per function
  83. 83. @theburningmonk theburningmonk.com github repo github repo github repo github repo github repo github repo github repo github repo github repo
  84. 84. @theburningmonk theburningmonk.com github repo github repo github repo github repo github repo github repo github repo github repo github repo
  85. 85. @theburningmonk theburningmonk.com monorepo?
  86. 86. @theburningmonk theburningmonk.com github repo
  87. 87. @theburningmonk theburningmonk.com one repo per service?
  88. 88. @theburningmonk theburningmonk.com github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  89. 89. @theburningmonk theburningmonk.com https://lumigo.io/blog/mono-repo-vs-one-per-service
  90. 90. @theburningmonk theburningmonk.com
  91. 91. @theburningmonk theburningmonk.com
  92. 92. @theburningmonk theburningmonk.com
  93. 93. @theburningmonk theburningmonk.com
  94. 94. @theburningmonk theburningmonk.com
  95. 95. @theburningmonk theburningmonk.com
  96. 96. @theburningmonk theburningmonk.com
  97. 97. @theburningmonk theburningmonk.com github repo github repo github repo github repo user-api timeline-api relationship-api search-api
  98. 98. @theburningmonk theburningmonk.com CI/CD pipeline per service
  99. 99. @theburningmonk theburningmonk.com functions are deployed together, as a stack
  100. 100. unencrypted secrets in env vars #7
  101. 101. @theburningmonk theburningmonk.com secrets should NEVER be in plain text in env variables
  102. 102. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: …
  103. 103. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM Environment: SECRET_1: … SECRET_2: … Environment: SECRET_1: … SECRET_2: … yay!
  104. 104. @theburningmonk theburningmonk.com
  105. 105. @theburningmonk theburningmonk.com
  106. 106. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM fetch at cold start, cache, invalidate every x mins
  107. 107. @theburningmonk theburningmonk.com https://github.com/middyjs/middy
  108. 108. @theburningmonk theburningmonk.com
  109. 109. @theburningmonk theburningmonk.com SSM Parameter Store Secret 1 Secret 2 IAM switch to Higher Throughput if you need more than 40 ops/s
  110. 110. not following least privilege principle #8
  111. 111. @theburningmonk theburningmonk.com
  112. 112. @theburningmonk theburningmonk.com
  113. 113. missing DLQs #9
  114. 114. @theburningmonk theburningmonk.com async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2v7Kc3b Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS
  115. 115. @theburningmonk theburningmonk.com async sync S3 SNS SES CloudFormation CloudWatch Logs CloudWatch Events Scheduled Events CodeCommit AWS Config http://amzn.to/2vs2lIg Cognito Alexa Lex API Gateway pulling DynamoDB Stream Kinesis Stream SQS Lambda handles retries (twice, then DLQ)
  116. 116. @theburningmonk theburningmonk.com configure DLQ for async functions so you don’t lose failed events
  117. 117. @theburningmonk theburningmonk.com
  118. 118. @theburningmonk theburningmonk.com
  119. 119. @theburningmonk theburningmonk.com
  120. 120. @theburningmonk theburningmonk.com
  121. 121. too much/too little concurrency #10
  122. 122. @theburningmonk theburningmonk.com “Lambda generates too much load for the downstream system”
  123. 123. @theburningmonk theburningmonk.com one invocation per message SNS Lambda
  124. 124. @theburningmonk theburningmonk.com Downstream System SNS Lambda
  125. 125. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  126. 126. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis
  127. 127. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out
  128. 128. @theburningmonk theburningmonk.com if you want… maximum throughput SNS precise control over throughput Kinesis how quickly it scales out SQS DynamoDB Streams
  129. 129. @theburningmonk theburningmonk.com ordering replay events Kinesis SQS SNS by shard none (standard) global (FIFO) none up to 7 days none none mode retry batched batched (up to 10) singular retried until success (customizable) retry + DLQ retry + DLQ concurrency 1 per shard auto-scaled fan-out!!! subscribers many one-to-one many EventBridge many none none singular retry + DLQ fan-out!!!
  130. 130. cold starts #11
  131. 131. @theburningmonk theburningmonk.com “cold starts only happen to the first request”
  132. 132. @theburningmonk theburningmonk.com function invocationconcurrent execution i.e. a container
  133. 133. @theburningmonk theburningmonk.com function invocationconcurrent execution i.e. a container class instance method call
  134. 134. @theburningmonk theburningmonk.com Lambda scales the number of concurrent executions based on traffic
  135. 135. @theburningmonk theburningmonk.com existing “containers” are reused where possible
  136. 136. @theburningmonk theburningmonk.com time invocation
  137. 137. @theburningmonk theburningmonk.com time invocation invocation
  138. 138. @theburningmonk theburningmonk.com time invocation invocation
  139. 139. @theburningmonk theburningmonk.com time invocation invocation invocation invocation
  140. 140. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation
  141. 141. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation
  142. 142. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation
  143. 143. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  144. 144. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  145. 145. @theburningmonk theburningmonk.com time invocation invocation invocation invocation invocation invocation invocation invocation
  146. 146. @theburningmonk theburningmonk.com
  147. 147. @theburningmonk theburningmonk.com
  148. 148. @theburningmonk theburningmonk.com
  149. 149. @theburningmonk theburningmonk.com
  150. 150. @theburningmonk theburningmonk.com time invocation invocation ping invocation invocation invocation ping ping
  151. 151. @theburningmonk theburningmonk.com Lambda warmers don’t work when you have > 1 concurrent executions
  152. 152. @theburningmonk theburningmonk.com FREQUENCY DURATION
  153. 153. @theburningmonk theburningmonk.com FREQUENCY DURATION dictated by user traffic, out of your control
  154. 154. @theburningmonk theburningmonk.com cold starts is generally not an issue if you have a steady traffic pattern
  155. 155. @theburningmonk theburningmonk.com time req/s
  156. 156. @theburningmonk theburningmonk.com time req/s El Classico
  157. 157. @theburningmonk theburningmonk.com time req/s lunch dinner
  158. 158. @theburningmonk theburningmonk.com FREQUENCY DURATION optimize this!
  159. 159. @theburningmonk theburningmonk.com minimise the duration of cold starts so they fall within acceptable latency range
  160. 160. @theburningmonk theburningmonk.com time req/s lunch dinner Provisioned Concurrency
  161. 161. @theburningmonk theburningmonk.com time req/s lunch dinner Provisioned Concurrency On-Demand Concurrency
  162. 162. @theburningmonk theburningmonk.com https://lumigo.io/blog/provisioned-concurrency-the-end-of-cold-starts/
  163. 163. @theburningmonk theburningmonk.com there are no silver bullets
  164. 164. @theburningmonk theburningmonk.com Provisioned concurrency is a powerful tool IFF you have a cold start problem don’t use it by default
  165. 165. RDS connection handling #12
  166. 166. @theburningmonk theburningmonk.com default RDS configs are bad for Lambda
  167. 167. @theburningmonk theburningmonk.com default RDS configs are bad for Lambda idle connections are not closed too many connections per “container” max open connection is too low
  168. 168. @theburningmonk theburningmonk.com https://www.jeremydaly.com/manage-rds-connections-aws-lambda/
  169. 169. @theburningmonk theburningmonk.com set “wait_timeout” and “interactive_timeout” to 10 mins (default is 8 hours!)
  170. 170. @theburningmonk theburningmonk.com increase “max_connections” setting
  171. 171. @theburningmonk theburningmonk.com set client socket pool size to 1
  172. 172. @theburningmonk theburningmonk.com
  173. 173. @theburningmonk theburningmonk.com
  174. 174. (lack of) observability #13
  175. 175. @theburningmonk theburningmonk.com happened system repaireduser impact reduce MTTR
  176. 176. @theburningmonk theburningmonk.com Identify & Resolve Issues Understanding costs Visibility
  177. 177. @theburningmonk theburningmonk.com Identify & Resolve Issues Understanding costs Visibility
  178. 178. @theburningmonk theburningmonk.com happened system repaireduser impact MTTDiscovery
  179. 179. @theburningmonk theburningmonk.com
  180. 180. @theburningmonk theburningmonk.com “What alerts should I have?”
  181. 181. @theburningmonk theburningmonk.com It depends on what you’re building…
  182. 182. @theburningmonk theburningmonk.com But, this is a good starting point
  183. 183. @theburningmonk theburningmonk.com Lambda error rate % throttle count DLR error count iterator age regional concurrency
  184. 184. @theburningmonk theburningmonk.com Lambda error rate % throttle count DLR error count iterator age regional concurrency API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate %
  185. 185. @theburningmonk theburningmonk.com API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Lambda error rate % throttle count DLR error count iterator age regional concurrency
  186. 186. @theburningmonk theburningmonk.com API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Step Functions failed count throttle count timed out count Lambda error rate % throttle count DLR error count iterator age regional concurrency
  187. 187. @theburningmonk theburningmonk.com SQS message age Step Functions failed count throttle count timed out count API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % Lambda error rate % throttle count DLR error count iterator age regional concurrency
  188. 188. @theburningmonk theburningmonk.com monitor and alert on message flow rate for event processing pipelines
  189. 189. @theburningmonk theburningmonk.com “Can’t you codify these?”
  190. 190. @theburningmonk theburningmonk.com
  191. 191. @theburningmonk theburningmonk.com not letting go of legacy thinking one account that rules them all do first, research later not using a deployment framework console-driven development one repo per function unencrypted secrets in env vars not following least privilege principle missing DLQs too much/too little concurrency cold starts RDS connection handling (lack of) observability #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13
  192. 192. https://theburningmonk.com/hire-me AdviseTraining Delivery “Fundamentally, Yan has improved our team by increasing our ability to derive value from AWS and Lambda in particular.” Nick Blair Tech Lead
  193. 193. @theburningmonk theburningmonk.com Production-Ready Serverless
  194. 194. in your company flexible datesHelsinki, Aug 20-21 London, Sep 24-25 Berlin, Oct 8-9Amsterdam, Jul 7-8 4-week virtual workshop, May 4 - May 29 @theburningmonk theburningmonk.com theburningmonk.com/workshops slsdays-virtual-202004 €100 off all my workshops
  195. 195. @theburningmonk theburningmonk.com lambdabestpractice.com bit.ly/complete-guide-to-aws-step-functions 20% off my courses slsdays-virtual-202004
  196. 196. @theburningmonk theburningmonk.com github.com/theburningmonk

More Related Content

Related Books

Free with a 30 day trial from Scribd

See all

×