Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AWS Lambda from the Trenches

5,071 views

Published on

In this talk, we discussed lessons have learnt about operating AWS Lambda in production, whilst transitioning from a traditional, monolithic system.

Published in: Technology
  • Be the first to comment

AWS Lambda from the Trenches

  1. 1. + = AWS LAMBDA FROM THE TRENCHES what you should know before you go to production
  2. 2. hi, my name is Yan Cui
  3. 3. @theburningmonk
  4. 4. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  5. 5. security complexity OUTSIDE the code deployment load balancing caching monitoring config management https://www.infoq.com/presentations/complexity-simplicity-esb centralised logging elastic scaling setup server
  6. 6. THERE IS NO SERVER
  7. 7. automatic scaling
  8. 8. minimise undifferentiated heavy-lifting
  9. 9. simple, fast deployment
  10. 10. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  11. 11. cost saving
  12. 12. not paying for idle servers
  13. 13. energy efficiency in DCs
  14. 14. easy to get started
  15. 15. fuelling the Yubl platform evolution
  16. 16. completely rebuilt search
  17. 17. Legacy Monolith Amazon Kinesis Amazon Lambda
  18. 18. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  19. 19. analytics pipeline
  20. 20. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery
  21. 21. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery 1 developer, 2 days design production (his 1st serverless project)
  22. 22. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery “nothing ever got done this fast at Skype!” - Chris Twamley
  23. 23. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  24. 24. Facebook login
  25. 25. Amazon Lambda GrapheneDBAmazon API Gateway Amazon API Gateway Amazon Lambda Facebook Graph API
  26. 26. and many more…
  27. 27. GET PRODUCTION-READY
  28. 28. USE A
 DEPLOYMENT FRAMEWORK
  29. 29. http://serverless.com
  30. 30. http://apex.run
  31. 31. https://github.com/claudiajs/claudia
  32. 32. TESTING
  33. 33. Amazon Lambda Amazon Kinesis Amazon IOT Amazon IOT
  34. 34. “I thought of objects being like biological cells and/or individual computers on a network, only able to communicate with messages.” - Alan Kay
  35. 35. Amazon Lambda Amazon Kinesis Amazon IOT Amazon IOT
  36. 36. “OOP to me means only messaging, local retention and protection and hiding of state- process, and extreme late- binding of all things.” - Alan Kay
  37. 37. amzn.to/29Lxuzu
  38. 38. Level of Testing 1.Unit do our objects do the right thing? are they easy to work with?
  39. 39. Level of Testing 1.Unit 2.Integration does our code work against code we can’t change?
  40. 40. handler
  41. 41. handler test by invoking the handler
  42. 42. Level of Testing 1.Unit 2.Integration 3.Acceptance does the whole system work?
  43. 43. Level of Testing unit integration acceptance
  44. 44. Level of Testing unit integration acceptance can do all 3 with Lambda
  45. 45. “…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise. The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…” Don’t Mock Types You Can’t Change
  46. 46. “…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do… Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…” Don’t Mock Types You Can’t Change
  47. 47. Don’t Mock Types You Can’t Change Services
  48. 48. “…Wherever possible, an acceptance test should exercise the system end-to- end without directly calling its internal code. An end-to-end test interacts with the system only from the outside: through its interface…” Testing End-to-End
  49. 49. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  50. 50. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input
  51. 51. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input Validate
  52. 52. “…We prefer to have the end-to-end tests exercise both the system and the process by which it’s built and deployed… This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software’s lifetime…” Testing End-to-End
  53. 53. Jenkins build config deploys and tests unit + integration tests deploy acceptance tests
  54. 54. build.sh allows repeatable builds on both local & CI
  55. 55. TEAM WORK
  56. 56. shared environments GOALS
  57. 57. easily propagate environmental changes GOALS
  58. 58. PRO TIP don’t ignore _meta
  59. 59. centralised config service
  60. 60. config service goes here
  61. 61. APP SECRETS
  62. 62. GOALS sensitive data are encrypted at rest (credentials, connection string, etc.)
  63. 63. GOALS has to work on CI
  64. 64. GOALS role-based access
  65. 65. hand-rolled with KMS (encrypted at rest)
  66. 66. hand-rolled with KMS
  67. 67. plug-ins serverless-plugin-kmsvariables serverless-secrets serverless-meta-sync
  68. 68. centralised config service
  69. 69. DOCUMENTATION
  70. 70. set goals
  71. 71. set goals choose a way
  72. 72. set goals choose a way document
  73. 73. create project templates/scaffolds
  74. 74. set goals choose a way evaluate document
  75. 75. set goals choose a way evaluate document
  76. 76. set goals choose a way evaluate document share
  77. 77. LOGGING
  78. 78. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?
  79. 79. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now? UTC Timestamp API Gateway Request Id your log message
  80. 80. organised by Function + Version
  81. 81. LOG OVERLOAD
  82. 82. centralise your logs
  83. 83. CloudWatch Logs AWS Lambda LogStash ElasticSearch
  84. 84. CloudWatch Logs AWS Lambda LogStash ElasticSearch AWS Elasticsearch
  85. 85. CloudWatch Logs AWS Lambda LogStash ElasticSearch AWS Elasticsearch Elastic Cloud
  86. 86. CloudWatch Logs AWS Lambda LogStash ElasticSearch AWS Elasticsearch Elastic Cloud ?
  87. 87. correlation IDs
  88. 88. MONITORING
  89. 89. PRO TIP set up dashboards
  90. 90. PRO TIP don’t forget to set up alarms
  91. 91. PRO TIP add application-level metrics
  92. 92. ERROR HANDLING
  93. 93. “how do I return HTTP error codes?”
  94. 94. { “status” : 404, “errorMessage” : ”oops” }
  95. 95. { “status” : 404, “errorMessage” : ”oops” }
  96. 96. s-templates.json { “status” : 404, “errorMessage” : ”oops” }
  97. 97. PRO TIP map timeouts to 504
  98. 98. every Lambda function has a timeout setting
  99. 99. use error regex to map it to a HTTP 504
  100. 100. s-templates.json
  101. 101. PRO TIP avoid using 128mb setting for production
  102. 102. continuous timeout loop…
  103. 103. PRO TIP proactively time out your function
  104. 104. “what’s the retry strategy with Kinesis and SNS?”
  105. 105. “…If the invocation for one record times out, is throttled, or encounters any other error, Lambda will retry until it succeeds (or the record reaches its 24-hour expiration) before moving on to the next record…” http://aws.amazon.com/lambda/faqs
  106. 106. • do nothing • swallow errors • track retry count effort
  107. 107. • retry forever • no retry • retry N times
  108. 108. PRO TIP use local state to track no. of retries; move on after N retries
  109. 109. PRO TIP record CloudWatch metrics for error count; alarm if necessary
  110. 110. retried 3-5 times
  111. 111. KEEP WARM
  112. 112. functions are unloaded if idle for a while
  113. 113. noticeable cold start time (package size matters)
  114. 114. CloudWatch Event AWS Lambda
  115. 115. CloudWatch Event AWS Lambda ping ping ping ping
  116. 116. CloudWatch Event AWS Lambda ping ping ping ping
  117. 117. CloudWatch Event AWS Lambda ping ping ping ping HEALTH CHECKS?
  118. 118. even then…
  119. 119. functions are recycled every few hours
  120. 120. functions are recycled every few hours
  121. 121. PRO TIP don’t make hard assumptions about function lifetime
  122. 122. KNOW YOUR LIMITS
  123. 123. max 50 MB deployment package size
  124. 124. max 50 MB deployment package size max 75 GB total deployment package size* * limit is per AWS region
  125. 125. Janitor Monkey
  126. 126. Janitor Lambda
  127. 127. max 5 mins execution time
  128. 128. max 6 MB request payload size* max 6 MB response payload size * for a request-response event type
  129. 129. default max 100 concurrent executions* * soft-limit, can be raised via support ticket
  130. 130. looking ahead
  131. 131. .Net core? SQS support?
  132. 132. v1.0 (coming soon)
  133. 133. MULTI-CLOUD FUTURE?
  134. 134. IBM OpenWhisk Amazon Lambda Azure Web Functions Google Cloud Functions competition faster innovation lower prices
  135. 135. @theburningmonk @theburningmonk theburningmonk.com github.com/theburningmonk

×