Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apply best parts of microservices to serverless

1,172 views

Published on

Whenever a new paradigm comes along, we often cast the previous incumbents as relics to be forgotten by history, only to then repeat the same mistakes as they once did. On the surface Serverless has revolutionised how we build and run software, but deep down we are still building microservices and face the same challenges. As more of us adopt Serverless and build increasingly complex systems using this new paradigm, it's important to take a moment to reflect on the lessons others have learnt about building microservices and how they can be applied to our Serverless applications.

Published in: Technology
  • Be the first to comment

Apply best parts of microservices to serverless

  1. 1. applying the best parts of Microservices to Serverless
  2. 2. Yan Cui http://theburningmonk.com @theburningmonk Principal Engineer @
  3. 3. follow @DAZN_ngnrs for updates about the engineering team We’re hiring! Visit engineering.dazn.com to learn more.
  4. 4. 2006
  5. 5. 2010
  6. 6. 2010
  7. 7. 2016
  8. 8. 2016
  9. 9. SQL NoSQL OOP Functional On Premise Cloud Waterfall Agile Monoliths Microservices
  10. 10. Server-ful Serverless
  11. 11. https://en.wikipedia.org/wiki/Hype_cycle
  12. 12. https://gtnr.it/2KGyGCM
  13. 13. what’s this?
  14. 14. what’s this? this solves all my problems!
  15. 15. what’s this? this solves all my problems! this is rubbish!
  16. 16. what’s this? this solves all my problems! this is rubbish! I’m starting to get it..
  17. 17. what’s this? this solves all my problems! this is rubbish! I’m starting to get it.. I know what I’m doing
  18. 18. SQL NoSQL OOP Functional On Premise Cloud Waterfall Agile Monoliths Microservices Server-ful Serverless
  19. 19. “those who cannot remember the past are condemned to repeat it” - George Santayana
  20. 20. what’s this? this solves all my problems! this is rubbish! I’m starting to get it.. I know what I’m doing
  21. 21. lesson 1. don’t fly blind
  22. 22. 2017 observability
  23. 23. http://bit.ly/2EXQZBj
  24. 24. http://bit.ly/2EXKEFZ
  25. 25. These are the four pillars of the Observability Engineering team’s charter: • Monitoring • Alerting/visualization • Distributed systems tracing infrastructure • Log aggregation/analytics “ ” http://bit.ly/2DnjyuW- Observability Engineering at Twitter
  26. 26. NO ACCESS to underlying OS
  27. 27. NOWHERE to install agents/daemons
  28. 28. user request user request user request user request user request user request user request critical paths: minimise user-facing latency handler handler handler handler handler handler handler
  29. 29. user request user request user request user request user request user request user request critical paths: minimise user-facing latency StatsD handler handler handler handler handler handler handler rsyslog background processing: batched, asynchronous, low overhead
  30. 30. user request user request user request user request user request user request user request critical paths: minimise user-facing latency StatsD handler handler handler handler handler handler handler rsyslog background processing: batched, asynchronous, low overhead NO background processing except what platform provides
  31. 31. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?
  32. 32. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now? UTC Timestamp Request Id your log message
  33. 33. one log group per function one log stream for each concurrent invocation
  34. 34. logs are not easily searchable in CloudWatch Logs me
  35. 35. CloudWatch Logs
  36. 36. CloudWatch Logs AWS Lambda ELK stack
  37. 37. http://bit.ly/lambda-log-aggregation
  38. 38. you need to use structured logging me
  39. 39. CloudWatch Logs $0.50 per GB ingested $0.03 per GB archived per month
  40. 40. CloudWatch Logs $0.50 per GB ingested $0.03 per GB archived per month 1M invocation of a 128MB function = $0.000000208 * 1M + $0.20 = $0.408
  41. 41. DON’T leave debug logging ON in production
  42. 42. you need to sample debug logs in production me
  43. 43. volume of logs observability all debug logs no debug logs sampled debug logs hurts mean time to resolution (MTTR) during a production incident
  44. 44. volume of logs observability all debug logs no debug logs sampled debug logs $$$$$$
  45. 45. always log the invocation event on error me
  46. 46. http://bit.ly/lambda-sample-debug-logs
  47. 47. “what about metrics?”
  48. 48. my code send metrics
  49. 49. my code send metrics
  50. 50. my code send metrics internet internet press button something happens
  51. 51. those extra 10-20ms for sending custom metrics would compound when you have microservices and multiple APIs are called within one slice of user event
  52. 52. Amazon found every 100ms of latency cost them 1% in sales. http://bit.ly/2EXPfbA
  53. 53. console.log(“hydrating yubls from db…”); console.log(“fetching user info from user-api”); console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”); console.log(“MONITORING|1489795335|8|count|yubls-served”); timestamp metric value metric type metric namemetrics logs
  54. 54. CloudWatch Logs AWS Lambda ELK stack logs m etrics CloudWatch
  55. 55. API Gateway send custom metrics asynchronously
  56. 56. SNS KinesisS3API Gateway … send custom metrics asynchronously send custom metrics as part of function invocation
  57. 57. http://bit.ly/2Dpidje
  58. 58. ? functions are often chained together via asynchronous invocations
  59. 59. ? SNS Kinesis CloudWatch Events CloudWatch LogsIoT DynamoDB S3 SES
  60. 60. ? SNS Kinesis CloudWatch Events CloudWatch LogsIoT DynamoDB S3 SES tracing ASYNCHRONOUS invocations through so many different event sources is difficult
  61. 61. X-Ray
  62. 62. do not span over API Gateway
  63. 63. narrow focus on a function good for homing in on performance issues for a particular function, but offers little to help you build intuition about how your system operates as a whole.
  64. 64. don’t span over async invocations good for identifying dependencies of a function, but not good enough for tracing the entire call chain as user request/data flows through the system via async event sources.
  65. 65. don’t span over non-AWS services
  66. 66. Nitzan Shapira @nitzanshapira Ran Ribenzaft @ranrib
  67. 67. correlation-IDs
  68. 68. correlation IDs* * eg. request-id, user-id, yubl-id, etc.
  69. 69. kinesis client http client sns client
  70. 70. http://bit.ly/lambda-correlation-ids
  71. 71. lesson 2. no shared DBs
  72. 72. shared DBs create TIGHT COUPLING between services
  73. 73. build loosely-coupled system through events
  74. 74. service A service B service C service D bounded context bounded context
  75. 75. service A service B service C service D bounded context bounded context
  76. 76. service A service B service C service D bounded context bounded context
  77. 77. service A service B service C service D bounded context bounded context
  78. 78. lesson 3. spiky load between services
  79. 79. service A service B
  80. 80. downstream systems might not be as scalable
  81. 81. service A service B Kinesis Lambda
  82. 82. service A service B Kinesis Lambda concurrency == no. of shards
  83. 83. service A service B Kinesis Lambda retried until success
  84. 84. lesson 4. failures are inevitable
  85. 85. complex distributed systems fail in.. well, complex, sometimes cascaded ways..
  86. 86. the only way to truly know your system’s resilience against failures is to test it through controlled experiments
  87. 87. there are more inherent chaos and complexity in a Serverless architecture
  88. 88. smaller units of deployment but A LOT more of them!
  89. 89. more difficult to harden around boundaries serverful serverless
  90. 90. ? SNS Kinesis CloudWatch Events CloudWatch LogsIoT DynamoDB S3 SES
  91. 91. ? Kinesis CloudWatch Events CloudWatch LogsIoT DynamoDB S3 more intermediary services, and greater variety too SNS SES
  92. 92. ? Kinesis CloudWatch Events CloudWatch LogsIoT DynamoDB S3 more intermediary services, and greater variety too each with its own set of failure modes SNS SES
  93. 93. more configurations, more opportunities for misconfiguration serverful serverless
  94. 94. more unknown failure modes in infrastructure that we don’t control
  95. 95. often there’s little we can do when an outage occurs in the platform
  96. 96. improperly tuned timeouts
  97. 97. missing error handling
  98. 98. missing fallback when downstream is unavailable
  99. 99. FAILURE INJECTION
  100. 100. inject failures
  101. 101. inject failures validate failure handling
  102. 102. Recap
  103. 103. Server-ful Serverless
  104. 104. “those who cannot remember the past are condemned to repeat it” - George Santayana
  105. 105. don’t fly blind
  106. 106. no shared DBs
  107. 107. amortize spiky load between services
  108. 108. failures are inevitable
  109. 109. @theburningmonk theburningmonk.com github.com/theburningmonk

×