Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Patterns and practices for building resilient Serverless applications

947 views

Published on

Recording: https://www.youtube.com/watch?v=pSfKZRv3nhY

Real-world serverless podcast: https://realworldserverless.com
Learn Lambda best practices: https://lambdabestpractice.com
Blog: https://theburningmonk.com
Consulting services: https://theburningmonk.com/hire-me
Production-Ready Serverless workshop: https://productionreadyserverless.com

Published in: Technology
  • Be the first to comment

Patterns and practices for building resilient Serverless applications

  1. 1. Patterns and Practices for building resilient serverless applications presented by Yan Cui @theburningmonk
  2. 2. @theburningmonk theburningmonk.com
  3. 3. @theburningmonk theburningmonk.com “the capacity to recover quickly from difficulties; toughness.” resilience /rɪˈzɪlɪəns/ noun
  4. 4. @theburningmonk theburningmonk.com “the capacity to recover quickly from difficulties; toughness.” resilience /rɪˈzɪlɪəns/ noun it’s not about preventing failures!
  5. 5. everything fails, all the time
  6. 6. @theburningmonk theburningmonk.com we need to build applications that can withstand failures
  7. 7. @theburningmonk theburningmonk.com
  8. 8. @theburningmonk theburningmonk.com don’t run your application on one server…
  9. 9. @theburningmonk theburningmonk.com entire data centers can go down…
  10. 10. @theburningmonk theburningmonk.com run your application in multiple AZs and regions
  11. 11. @theburningmonk theburningmonk.com Failures on load: exhaustion of resources
  12. 12. @theburningmonk theburningmonk.com Failures on load: exhaustion of resources
  13. 13. @theburningmonk theburningmonk.com latency reqs/s Failures on load: exhaustion of resources CPU saturation
  14. 14. @theburningmonk theburningmonk.com Failures in distributed systems Service A Service B Service C user
  15. 15. @theburningmonk theburningmonk.com Failures in distributed systems Service A Service B Service C user
  16. 16. @theburningmonk theburningmonk.com Failures in distributed systems Service A Service B Service C user HTTP 502
  17. 17. @theburningmonk theburningmonk.com Failures in distributed systems Service A Service B Service C user You suck!
  18. 18. @theburningmonk theburningmonk.com microservices death stars circa 2015
  19. 19. Yan Cui http://theburningmonk.com @theburningmonk AWS user for 10 years
  20. 20. Yan Cui http://theburningmonk.com @theburningmonk http://bit.ly/yubl-serverless
  21. 21. Yan Cui http://theburningmonk.com @theburningmonk Developer Advocate @
  22. 22. Yan Cui http://theburningmonk.com @theburningmonk Independent Consultant advisetraining delivery
  23. 23. by Uwe Friedrichsen
  24. 24. @theburningmonk theburningmonk.com Lambda execution environment
  25. 25. @theburningmonk theburningmonk.com Serverless - multiple AZ’s out of the box Total resources created: 1 API Gateway 1 Lambda
  26. 26. @theburningmonk theburningmonk.com Serverless - multiple AZ’s out of the box Total resources created: 1 API Gateway 1 Lambda don’t pay for idle redundant resources!
  27. 27. @theburningmonk theburningmonk.com Load balancing
  28. 28. @theburningmonk theburningmonk.com Data replication in different AZ’s DynamoDB Global Tables
  29. 29. @theburningmonk theburningmonk.com There are throttling everywhere!
  30. 30. @theburningmonk theburningmonk.com Beware of timeout mismatch API Gateway
 Integration timeout 
 Default: 29s Lambda
 Timeout Max: 15 minutes
  31. 31. @theburningmonk theburningmonk.com Beware of timeout mismatch Lambda
 Timeout Max: 15 minutes SQS
 Visibility timeout
 Default: 30s Min: 0s Max: 12 hours
  32. 32. @theburningmonk theburningmonk.com Beware of timeout mismatch Lambda
 Timeout Max: 15 minutes SQS
 Visibility timeout
 Default: 30s Min: 0s Max: 12 hours set VisibilityTimeout to 6x Lambda timeout
  33. 33. @theburningmonk theburningmonk.com Offload computing operations to queues
  34. 34. @theburningmonk theburningmonk.com Offload computing operations to queues
  35. 35. @theburningmonk theburningmonk.com Offload computing operations to queues better absorb downstream problems
  36. 36. @theburningmonk theburningmonk.com Offload computing operations to queues need way to replay DLQ events
  37. 37. https://www.npmjs.com/package/lumigo-cli
  38. 38. @theburningmonk theburningmonk.com Offload computing operations to queues great for fire-and-forget tasks
  39. 39. @theburningmonk theburningmonk.com “what if the client is waiting for a response?”
  40. 40. @theburningmonk theburningmonk.com “Decoupled Invocation”
  41. 41. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx <null> … … … task results not ready…
  42. 42. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx <null> … … … task results not ready… 202
  43. 43. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx <null> … … … task results reporting for duty!
  44. 44. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx <null> … … … task results working hard… not ready…
  45. 45. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx <null> … … … task results 202 working hard…
  46. 46. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx { … } … … … task results done!
  47. 47. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx { … } … … … task results done!
  48. 48. @theburningmonk theburningmonk.com task id created at result xxx xxx <null> xxx xxx { … } … … … task results 200 { … }
  49. 49. @theburningmonk theburningmonk.com wait…
  50. 50. @theburningmonk theburningmonk.com a distributed transaction!
  51. 51. @theburningmonk theburningmonk.com a distributed transaction! needs rollback
  52. 52. @theburningmonk theburningmonk.com no distributed transactions
  53. 53. @theburningmonk theburningmonk.com do the work here
  54. 54. @theburningmonk theburningmonk.com retry-until-success
  55. 55. @theburningmonk theburningmonk.com
  56. 56. @theburningmonk theburningmonk.com 24 hours data retention
  57. 57. @theburningmonk theburningmonk.com 24 hours data retention need alerting to ensure issue are addressed quickly
  58. 58. @theburningmonk theburningmonk.com retry-until-success needs to deal with poinson messages
  59. 59. @theburningmonk theburningmonk.com what if you can’t avoid distributed transactions?
  60. 60. @theburningmonk theburningmonk.com The Saga pattern A pattern for managing failures where each action has a compensating action for rollback
  61. 61. @theburningmonk theburningmonk.com The Saga pattern https://www.youtube.com/watch?v=xDuwrtwYHu8
  62. 62. @theburningmonk theburningmonk.com The Saga pattern Begin transaction Start book hotel request End book hotel request Start book flight request End book flight request Start book car rental request End book car rental request End transaction
  63. 63. @theburningmonk theburningmonk.com The Saga pattern model both actions and compensating actions as Lambda functions
  64. 64. @theburningmonk theburningmonk.com The Saga pattern use Step Functions as the coordinator for the saga
  65. 65. @theburningmonk theburningmonk.com The Saga pattern Input
  66. 66. @theburningmonk theburningmonk.com The Saga pattern
  67. 67. @theburningmonk theburningmonk.com The Saga pattern
  68. 68. @theburningmonk theburningmonk.com The Saga pattern
  69. 69. @theburningmonk theburningmonk.com retry-until-success needs to deal with poinson messages Mind the poison message
  70. 70. @theburningmonk theburningmonk.com Mind the poison message
  71. 71. @theburningmonk theburningmonk.com Mind the poison message
  72. 72. @theburningmonk theburningmonk.com Mind the poison message
  73. 73. @theburningmonk theburningmonk.com Mind the poison message 6, 3, 1, 1, 1, 1, …
  74. 74. @theburningmonk theburningmonk.com Mind the poison message 6, 3, 1, 1, 1, 1, … only count the “same” batch
  75. 75. @theburningmonk theburningmonk.com Mind the poison message
  76. 76. @theburningmonk theburningmonk.com Mind the poison message have to fetch from the stream
  77. 77. @theburningmonk theburningmonk.com Mind the poison message have to fetch from the stream do it before they expire from the stream!
  78. 78. @theburningmonk theburningmonk.com how do you prevent building up an insurmountable backlog?
  79. 79. @theburningmonk theburningmonk.com Load shedding implement load shedding prioritize newer messages with a better chance to succeed
  80. 80. @theburningmonk theburningmonk.com Load shedding excess load is sent to DLQ
  81. 81. @theburningmonk theburningmonk.com Load shedding process with a delay
  82. 82. @theburningmonk theburningmonk.com Mind the partial failures LambdaSQS
  83. 83. @theburningmonk theburningmonk.com Mind the partial failures LambdaSQS Poller
  84. 84. @theburningmonk theburningmonk.com LambdaSQS Poller Mind the partial failures Delete
  85. 85. @theburningmonk theburningmonk.com Mind the partial failures LambdaSQS Poller Error
  86. 86. @theburningmonk theburningmonk.com Mind the partial failures LambdaSQS Poller Error DLQ
  87. 87. @theburningmonk theburningmonk.com Mind the partial failures LambdaSQS Poller Error DLQ batch fails as a unit
  88. 88. https://lumigo.io/blog/sqs-and-lambda-the-missing-guide-on-failure-modes Mind the partial failures
  89. 89. @theburningmonk theburningmonk.com Mind the partial failures
  90. 90. @theburningmonk theburningmonk.com Mind the partial failures
  91. 91. @theburningmonk theburningmonk.com Mind the partial failures
  92. 92. @theburningmonk theburningmonk.com Mind the retry storm Service A
  93. 93. @theburningmonk theburningmonk.com Mind the retry storm Service A
  94. 94. @theburningmonk theburningmonk.com Mind the retry storm Service A retry retry retry retry
  95. 95. @theburningmonk theburningmonk.com Mind the retry storm Service A
  96. 96. @theburningmonk theburningmonk.com Mind the retry storm Service A
  97. 97. @theburningmonk theburningmonk.com Mind the retry storm Service A
  98. 98. @theburningmonk theburningmonk.com Mind the retry storm Service A
  99. 99. @theburningmonk theburningmonk.com retry storm
  100. 100. @theburningmonk theburningmonk.com circuit breaker pattern After X consecutive timeouts, trip the circuit
  101. 101. @theburningmonk theburningmonk.com circuit breaker pattern After X consecutive timeouts, trip the circuit When circuit is open, fail fast
  102. 102. @theburningmonk theburningmonk.com circuit breaker pattern When circuit is open, fail fast but, allow 1 request through every Y mins After X consecutive timeouts, trip the circuit
  103. 103. @theburningmonk theburningmonk.com circuit breaker pattern When circuit is open, fail fast but, allow 1 request through every Y mins If request succeeds, close the circuit After X consecutive timeouts, trip the circuit
  104. 104. @theburningmonk theburningmonk.com
  105. 105. @theburningmonk theburningmonk.com where do I keep the state of the circuit?
  106. 106. @theburningmonk theburningmonk.com in-memory Service A isOpen: false isOpen: false isOpen: false isOpen: false
  107. 107. @theburningmonk theburningmonk.com in-memory Service A isOpen: true isOpen: false isOpen: true isOpen: false
  108. 108. @theburningmonk theburningmonk.com in-memory PROS simplicity
  109. 109. @theburningmonk theburningmonk.com in-memory PROS simplicity no dependency on external service requires another circuit breaker to protect… cost & maintenance overhead (IAM, infra, etc.)
  110. 110. @theburningmonk theburningmonk.com in-memory PROS simplicity no dependency on external service CONS takes longer & more requests to stop all traffic
  111. 111. @theburningmonk theburningmonk.com in-memory PROS simplicity no dependency on external service CONS takes longer & more requests to stop all traffic new containers would generate more traffic
  112. 112. @theburningmonk theburningmonk.com external service Service AisOpen: false
  113. 113. @theburningmonk theburningmonk.com external service Service AisOpen: true
  114. 114. @theburningmonk theburningmonk.com external service Service AisOpen: true
  115. 115. @theburningmonk theburningmonk.com external service PROS minimizes no. of total requests to trip circuit new containers respect collective decision CONS complexity dependency on an external service
  116. 116. @theburningmonk theburningmonk.com which approach should I use?
  117. 117. @theburningmonk theburningmonk.com which approach should I use? It depends. Maybe start with the simplest solution first?
  118. 118. @theburningmonk theburningmonk.com Lambda autoscaling Burst concurrency limits:
 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland), 1000 – Asia Pacific (Tokyo), Europe (Frankfurt), 500 – Other Regions Burst: 500 new instances / each minute

  119. 119. @theburningmonk theburningmonk.com Lambda autoscaling Burst concurrency limits:
 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland), 1000 – Asia Pacific (Tokyo), Europe (Frankfurt), 500 – Other Regions Burst: 500 new instances / each minute
 Standard burst concurrency limits when over the provisioned capacity 

  120. 120. @theburningmonk theburningmonk.com Lambda autoscaling Burst concurrency limits:
 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland), 1000 – Asia Pacific (Tokyo), Europe (Frankfurt), 500 – Other Regions Burst: 500 new instances / each minute
 Adjustable provisioned capacity based on CloudWatch metrics Standard burst concurrency limits when over the provisioned capacity 

  121. 121. @theburningmonk theburningmonk.com Lambda limitations & throttling Concurrent executions: 1000*
 Timeout: 15 minutes
 Burst concurrency: 500 - 3000
 Burst: 500 new instances / minute * Can be increased with support ticket
  122. 122. @theburningmonk theburningmonk.com Lambda limitations & throttling good for spikey traffic, up to a point Concurrent executions: 1000*
 Timeout: 15 minutes
 Burst concurrency: 500 - 3000
 Burst: 500 new instances / minute * Can be increased with support ticket
  123. 123. @theburningmonk theburningmonk.com “what if my traffic is more spiky than that?”
  124. 124. @theburningmonk theburningmonk.com Scenario: predictable spikes Holidays, weekends,
 celebrations
 (Black Friday) Planned launch of
 resources
 (new series available) Sport events
  125. 125. @theburningmonk theburningmonk.com Scenario: predictable spikes scheduled auto-scaling
  126. 126. @theburningmonk theburningmonk.com Scenario: predictable spikes scheduled auto-scaling the burst limits still apply, factor the timing into account
  127. 127. @theburningmonk theburningmonk.com Scenario: predictable spikes
  128. 128. @theburningmonk theburningmonk.com Scenario: unpredictable spikes Traffic generated by user actions
 
 Jennifer Aniston’s first post
  129. 129. @theburningmonk theburningmonk.com “if Lambda scaling is the problem…”
  130. 130. @theburningmonk theburningmonk.com Client only needs an acknowledgement
  131. 131. https://lumigo.io/blog/the-why-when-and-how-of-api-gateway-service-proxies
  132. 132. @theburningmonk theburningmonk.com multi-region, active-active
  133. 133. @theburningmonk theburningmonk.com us-east-1 API Gateway Lambda DynamoDBRoute53
  134. 134. @theburningmonk theburningmonk.com eu-west-1 us-east-1 us-west-1
  135. 135. @theburningmonk theburningmonk.com eu-west-1 us-east-1 us-west-1 GlobalTable
  136. 136. @theburningmonk theburningmonk.com eu-west-1 us-east-1 us-west-1 GlobalTable
  137. 137. @theburningmonk theburningmonk.com eu-central-1 us-east-1 us-east-1 SQS Lambda DynamoDB Lambda API Gateway SNS SNS
  138. 138. @theburningmonk theburningmonk.com us-east-1 SQS Lambda DynamoDB Lambda API Gateway eu-central-1 us-east-1 SNS SNS
  139. 139. @theburningmonk theburningmonk.com us-east-1 SQS Lambda DynamoDB Lambda API Gateway eu-central-1 us-east-1 SNS SNS
  140. 140. https://lumigo.io/blog/amazon-builders-library-in-focus-5-static-stability-using-availability-zones
  141. 141. @theburningmonk theburningmonk.com us-east-1 SQS Lambda DynamoDB Lambda API Gateway eu-central-1 us-east-1 SNS SNS Ddedupe
  142. 142. @theburningmonk theburningmonk.com us-east-1 SQS Lambda DynamoDB Lambda API Gateway us-east-1 SNS eu-central-1 SNS eu-central-1 SQS Lambda DynamoDB Lambda API Gateway Global Table
  143. 143. @theburningmonk theburningmonk.com us-east-1 SQS Lambda DynamoDB Lambda API Gateway us-east-1 SNS eu-central-1 SNS eu-central-1 SQS Lambda DynamoDB Lambda API Gateway Global Table
  144. 144. @theburningmonk theburningmonk.com us-east-1 SQS Lambda DynamoDB Lambda API Gateway us-east-1 SNS eu-central-1 SNS eu-central-1 SQS Lambda DynamoDB Lambda API Gateway Global Table
  145. 145. @theburningmonk theburningmonk.com us-east-1 SQS Lambda DynamoDB Lambda API Gateway us-east-1 SNS eu-central-1 SNS eu-central-1 SQS Lambda DynamoDB Lambda API Gateway Global Table
  146. 146. @theburningmonk theburningmonk.com Multi-region architecture - benefits & tradeoffs Protection against
 regional failures Higher complexity Very hard to test
  147. 147. CHAOS ENGINEERING
  148. 148. MUST KILL SERVERS! RAWR!! RAWR!!
  149. 149. @theburningmonk theburningmonk.com “the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production” principlesofchaos.org
  150. 150. @theburningmonk theburningmonk.com “You don't choose the moment, the moment chooses you! You only choose how prepared you are when it does.” Fire Chief Mike Burtch
  151. 151. @theburningmonk theburningmonk.com identify weaknesses before they manifest in system-wide, aberrant behaviors GOAL
  152. 152. @theburningmonk theburningmonk.com learn about the system’s behavior by observing it during a controlled experiments HOW
  153. 153. @theburningmonk theburningmonk.com learn about the system’s behavior by observing it during a controlled experiments HOW game days failure injection
  154. 154. @theburningmonk theburningmonk.com MUST KILL SERVERS! RAWR!! RAWR!! ahhhhhhh!!!! HELP!!! OMG!!! F***!!!
  155. 155. @theburningmonk theburningmonk.com phew!
  156. 156. @theburningmonk theburningmonk.com STEP 1. define steady state i.e. “what does normal look like”
  157. 157. @theburningmonk theburningmonk.com STEP 2. hypothesis that steady state continues in control and experimental group e.g. “the system stays up if a server dies”
  158. 158. @theburningmonk theburningmonk.com STEP 3. inject realistic failures e.g. “slow response from 3rd-party service”
  159. 159. @theburningmonk theburningmonk.com STEP 4. try to disprove hypothesis i.e. “look for difference between control and experimental group”
  160. 160. DON’T START EXPERIMENTS IN PRODUCTION
  161. 161. @theburningmonk theburningmonk.com identify weaknesses before they manifest in system-wide, aberrant behaviors GOAL
  162. 162. @theburningmonk theburningmonk.com “Corporation X lost millions due to a chaos experiment went wrong and destroyed key infrastructure, resulting in hours of downtime and unrecoverable data loss.”
  163. 163. @theburningmonk theburningmonk.com Chaos Engineering doesn't cause problems. It reveals them. Nora Jones
  164. 164. CONTAINMENT
  165. 165. CONTAINMENT run experiments during office hours
  166. 166. CONTAINMENT run experiments during office hours let others know what you’re doing, no surprises
  167. 167. CONTAINMENT run experiments during office hours let others know what you’re doing, no surprises avoid important dates
  168. 168. CONTAINMENT run experiments during office hours let others know what you’re doing, no surprises avoid important dates make the smallest change possible
  169. 169. CONTAINMENT run experiments during office hours let others know what you’re doing, no surprises avoid important dates make the smallest change possible have a rollback plan before you start
  170. 170. DON’T START EXPERIMENTS IN PRODUCTION
  171. 171. by Russ Miles @russmiles source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
  172. 172. @theburningmonk theburningmonk.com chaos monkey kills an EC2 instance latency monkey induces artificial delay in APIs chaos gorilla kills an AWS Availability Zone chaos kong kills an entire AWS region
  173. 173. @theburningmonk theburningmonk.com
  174. 174. @theburningmonk theburningmonk.com there are no servers to kill! SERVERLESS
  175. 175. by Russ Miles @russmiles source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
  176. 176. by Russ Miles @russmiles source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
  177. 177. @theburningmonk theburningmonk.com improperly tuned timeouts
  178. 178. @theburningmonk theburningmonk.com missing error handling
  179. 179. @theburningmonk theburningmonk.com missing fallbacks
  180. 180. @theburningmonk theburningmonk.com
  181. 181. @theburningmonk theburningmonk.com “what if DynamoDB has an elevated error rate?”
  182. 182. @theburningmonk theburningmonk.com hypothesis: the AWS SDK retries would handle it
  183. 183. DEMO TIME!
  184. 184. @theburningmonk theburningmonk.com result: function times out after 6s (hypothesis is disproved)
  185. 185. @theburningmonk theburningmonk.com TIL: the js DynamoDB client defaults to 10 retries with base delay of 50ms
  186. 186. @theburningmonk theburningmonk.com TIL: the js DynamoDB client defaults to 10 retries with base delay of 50ms delay = Math.random() * (Math.pow(2, retryCount) * base) this is Marc Brooker’s fav formula!
  187. 187. @theburningmonk theburningmonk.com
  188. 188. @theburningmonk theburningmonk.com action: set max retry count + fallback
  189. 189. DEMO TIME!
  190. 190. @theburningmonk theburningmonk.com outcome: a more resilient system
  191. 191. @theburningmonk theburningmonk.com “what if service X has elevated latency?”
  192. 192. @theburningmonk theburningmonk.com hypothesis: our try-catch would handle it
  193. 193. DEMO TIME!
  194. 194. @theburningmonk theburningmonk.com result: function times out after 6s (hypothesis is disproved)
  195. 195. @theburningmonk theburningmonk.com TIL: most HTTP client libraries have default timeout of 60s. API Gateway has an integration timeout of 29s. Most Lambda functions default to timeout of 3-6s.
  196. 196. @theburningmonk theburningmonk.com
  197. 197. @theburningmonk theburningmonk.com
  198. 198. @theburningmonk theburningmonk.com https://bit.ly/2Wvfort
  199. 199. @theburningmonk theburningmonk.com
  200. 200. @theburningmonk theburningmonk.com
  201. 201. DEMO TIME!
  202. 202. @theburningmonk theburningmonk.com outcome: a more resilient system
  203. 203. recap
  204. 204. everything fails, all the time
  205. 205. @theburningmonk theburningmonk.com “the capacity to recover quickly from difficulties; toughness.” resilience /rɪˈzɪlɪəns/ noun
  206. 206. @theburningmonk theburningmonk.com Serverless - multiple AZ’s out of the box Total resources created: 1 API Gateway 1 Lambda
  207. 207. @theburningmonk theburningmonk.com Beware of timeouts API Gateway
 Integration timeout 
 Default: 29s Lambda
 Timeout Max: 15 minutes SQS
 Visibility timeout
 Default: 30s Min: 0s Max: 12 hours
  208. 208. @theburningmonk theburningmonk.com Offload computing operations to queues
  209. 209. @theburningmonk theburningmonk.com “Decoupled Invocation”
  210. 210. @theburningmonk theburningmonk.com no distributed transactions
  211. 211. @theburningmonk theburningmonk.com retry-until-success
  212. 212. @theburningmonk theburningmonk.com
  213. 213. @theburningmonk theburningmonk.com retry-until-success needs to deal with poinson messages
  214. 214. @theburningmonk theburningmonk.com Mind the poison message 6, 3, 1, 1, 1, 1, … only count the “same” batch
  215. 215. @theburningmonk theburningmonk.com Load shedding implement load shedding prioritize newer messages with a better chance to succeed
  216. 216. @theburningmonk theburningmonk.com circuit breaker pattern When circuit is open, fail fast but, allow 1 request through every Y mins If request succeeds, close the circuit After X consecutive timeouts, trip the circuit
  217. 217. @theburningmonk theburningmonk.com The Saga pattern A pattern for managing failures where each action has a compensating action for rollback
  218. 218. @theburningmonk theburningmonk.com Mind the partial failures
  219. 219. @theburningmonk theburningmonk.com Lambda autoscaling Burst concurrency limits:
 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland), 1000 – Asia Pacific (Tokyo), Europe (Frankfurt), 500 – Other Regions Burst: 500 new instances / each minute
 Adjustable provisioned capacity based on CloudWatch metrics Standard burst concurrency limits when over the provisioned capacity 

  220. 220. @theburningmonk theburningmonk.com Scenario: predictable spikes scheduled auto-scaling the burst limits still apply, factor the timing into account
  221. 221. @theburningmonk theburningmonk.com eu-west-1 us-east-1 us-west-1 GlobalTable
  222. 222. @theburningmonk theburningmonk.com “the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production” principlesofchaos.org
  223. 223. by Russ Miles @russmiles source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
  224. 224. by Russ Miles @russmiles source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
  225. 225. @theburningmonk theburningmonk.com
  226. 226. https://theburningmonk.com/hire-me AdviseTraining Delivery “Fundamentally, Yan has improved our team by increasing our ability to derive value from AWS and Lambda in particular.” Nick Blair Tech Lead
  227. 227. @theburningmonk theburningmonk.com lambdabestpractice.com bit.ly/complete-guide-to-aws-step-functions 20% off my courses aws-delhi-may2020
  228. 228. @theburningmonk theburningmonk.com github.com/theburningmonk

×