Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Serverless in production, an experience report (codemotion milan)

447 views

Published on

AWS Lambda has changed the way we deploy and run software, but the serverless paradigm has created new challenges to old problems: How do you test a cloud-hosted function locally? How do you monitor them? What about logging and config management? And how do we start migrating from existing architectures?

Yan Cui shares solutions to these challenges, drawing on his experience running Lambda in production and migrating from an existing monolithic architecture.

Published in: Technology
  • Be the first to comment

Serverless in production, an experience report (codemotion milan)

  1. 1. from the TRENCHESTRENCHES what you should know before you go to production AWS LAMBDAAWS LAMBDA
  2. 2. hi,I’mYanCui
  3. 3. hi,I’mYanCui AWS user since 2009
  4. 4. apr, 2016
  5. 5. hidden complexities and dependencies low utilisation to leave room for traffic spikes EC2 scaling is slow, so scale earlier lots of cost for unused resources up to 30 mins for deployment deployment required downtime
  6. 6. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  7. 7. “what would good look like for us?”
  8. 8. be small be fast have zero downtime have no lock-step DEPLOYMENTS SHOULD...
  9. 9. FEATURES SHOULD... be deployable independently be loosely-coupled
  10. 10. WE WANT TO... minimise cost for unused resources minimise ops effort reduce tech mess deliver visible improvements faster
  11. 11. nov, 2016
  12. 12. 170 Lambda functions in prod 1.2 GB deployment packages in prod 95% cost saving vs EC2 15x no. of prod releases per month
  13. 13. time is a good fit
  14. 14. 1st function in prod! time is a good fit
  15. 15. ? time is a good fit 1st function in prod!
  16. 16. ALERTING CI / CD TESTING LOGGING MONITORING
  17. 17. Practices ToolsPrinciples what is good? how to make it good? with what?
  18. 18. Principles outlast Tools
  19. 19. 170 functions WOOF! ? ? time is a good fit 1st function in prod!
  20. 20. SECURITY DISTRIBUTED TRACING CONFIG MANAGEMENT
  21. 21. evolving the PLATFORM
  22. 22. rebuilt search
  23. 23. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearch
  24. 24. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  25. 25. new analytics pipeline
  26. 26. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery
  27. 27. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery 1 developer, 2 days design production (his 1st serverless project)
  28. 28. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery “nothing ever got done this fast at Skype!” - Chris Twamley
  29. 29. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  30. 30. Rebuilt with Lambda
  31. 31. Rebuilt with Lambda
  32. 32. BigQuery
  33. 33. BigQuery
  34. 34. grapheneDB BigQuery
  35. 35. grapheneDB BigQuery
  36. 36. grapheneDB BigQuery
  37. 37. getting PRODUCTION READY
  38. 38. CHOOSE A FRAMEWORK DEPLOYMENT
  39. 39. http://serverless.com
  40. 40. https://github.com/awslabs/serverless-application-model
  41. 41. http://apex.run
  42. 42. https://apex.github.io/up
  43. 43. https://github.com/claudiajs/claudia
  44. 44. https://github.com/Miserlou/Zappa
  45. 45. http://gosparta.io/
  46. 46. TESTING
  47. 47. amzn.to/29Lxuzu
  48. 48. Level of Testing 1.Unit do our objects do the right thing? are they easy to work with?
  49. 49. Level of Testing 1.Unit 2.Integration does our code work against code we can’t change?
  50. 50. handler
  51. 51. handler test by invoking the handler
  52. 52. Level of Testing 1.Unit 2.Integration 3.Acceptance does the whole system work?
  53. 53. Level of Testing unit integration acceptance feedback confidence
  54. 54. “…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise. The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…” Don’t Mock Types You Can’t Change
  55. 55. “…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do… Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…” Don’t Mock Types You Can’t Change
  56. 56. Don’t Mock Types You Can’t Change Services
  57. 57. Paul Johnston The serverless approach to testing is different and may actually be easier. http://bit.ly/2t5viwK
  58. 58. LambdaAPI Gateway DynamoDB
  59. 59. LambdaAPI Gateway DynamoDB Unit Tests
  60. 60. LambdaAPI Gateway DynamoDB Unit Tests Mock/Stub
  61. 61. is our request correct? is the request mapping set up correctly?is the API resources configured correctly? are we assuming the correct schema? LambdaAPI Gateway DynamoDB is Lambda proxy configured correctly? is IAM policy set up correctly? is the table created? what unit tests will not tell you…
  62. 62. most Lambda functions are simple have single purpose, the risk of shipping broken software has largely shifted to how they integrate with external services observation
  63. 63. But it slows down my feedback loop… IT’S NOT ABOUT YOU!
  64. 64. …if a service can’t provide you with a relatively easy way to test the interface in reality, then you should consider using another one. Paul Johnston
  65. 65. “…Wherever possible, an acceptance test should exercise the system end-to- end without directly calling its internal code. An end-to-end test interacts with the system only from the outside: through its interface…” Testing End-to-End
  66. 66. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  67. 67. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input
  68. 68. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input Validate
  69. 69. integration tests exercise system’s Integration with its external dependencies
  70. 70. acceptance tests exercise system End-to-End from the outside
  71. 71. integration tests differ from acceptance tests only in HOW the Lambda functions are invoked observation
  72. 72. CI + CD PIPELINE
  73. 73. “the earlier you consider CI + CD, the more time you save in the long run” - me
  74. 74. “…We prefer to have the end-to-end tests exercise both the system and the process by which it’s built and deployed… This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software’s lifetime…” Testing End-to-End
  75. 75. “deployment scripts that only live on the CI box is a disaster waiting to happen” - me
  76. 76. Jenkins build config deploys and tests unit + integration tests deploy acceptance tests
  77. 77. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi
  78. 78. build.sh allows repeatable builds on both local & CI
  79. 79. Auto Auto Manual
  80. 80. LOGGING
  81. 81. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?
  82. 82. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now? UTC Timestamp API Gateway Request Id your log message
  83. 83. function name date function version
  84. 84. me Logs are not easily searchable in CloudWatch Logs.
  85. 85. LOG OVERLOAD
  86. 86. CENTRALISE LOGS
  87. 87. CENTRALISE LOGS MAKE THEM EASILY SEARCHABLE
  88. 88. + + the elk stack
  89. 89. CloudWatch Logs
  90. 90. CloudWatch Logs AWS Lambda ELK stack
  91. 91. CloudWatch Events
  92. 92. http://bit.ly/2f3zxQG
  93. 93. DISTRIBUTED TRACING
  94. 94. “my followers didn’t receive my new post!” - a user
  95. 95. where could the problem be?
  96. 96. correlation IDs* * eg. request-id, user-id, yubl-id, etc.
  97. 97. ROLL YOUR OWN CLIENTS
  98. 98. kinesis client http client sns client
  99. 99. http://bit.ly/2k93hAj
  100. 100. ROLL YOUR OWN CLIENTS X-RAY
  101. 101. Amazon X-Ray
  102. 102. Amazon X-Ray
  103. 103. traces do not span over API Gateway
  104. 104. http://bit.ly/2s9yxmA
  105. 105. MONITORING + ALERTING
  106. 106. “where do I install monitoring agents?”
  107. 107. you can’t
  108. 108. • invocation Count • error Count • latency • throttling • granular to the minute • support custom metrics
  109. 109. • same metrics as CW • better dashboard • support custom metrics https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/
  110. 110. my code
  111. 111. my code
  112. 112. my code internet internet press button something happens
  113. 113. “how do I batch up and send logs in the background?”
  114. 114. you can’t (kinda)
  115. 115. console.log(“hydrating yubls from db…”); console.log(“fetching user info from user-api”); console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”); console.log(“MONITORING|1489795335|8|count|yubls-served”); timestamp metric value metric type metric namemetrics logs
  116. 116. CloudWatch Logs AWS Lambda ELK stack logs metrics CloudWatch
  117. 117. http://bit.ly/2gGredx
  118. 118. DASHBOARDS
  119. 119. DASHBOARDS SET ALARMS
  120. 120. DASHBOARDS SET ALARMS TRACK APP-LEVEL METRICS
  121. 121. Not Only CloudWatch
  122. 122. “you really don't want your monitoring system to fail at the same time as the system it monitors” - me
  123. 123. CONFIG MANAGEMENT
  124. 124. easily and quickly propagate config changes
  125. 125. me Environment variables make it hard to share configurations across functions.
  126. 126. me Environment variables make it hard to implement fine-grained access to sensitive info.
  127. 127. CENTRALISED CONFIG SERVICE
  128. 128. config service goes here
  129. 129. SSM Parameter Store
  130. 130. sensitive data should be encrypted in-flight, and at rest (credentials, connection string, etc.)
  131. 131. role-based access
  132. 132. SSM Parameter Store HTTPS role-based access encrypted in-flight
  133. 133. SSM Parameter Store encrypt role-based access
  134. 134. SSM Parameter Store encrypted at-rest
  135. 135. HTTPS role-based access SSM Parameter Store encrypted in-flight
  136. 136. CENTRALISED CONFIG SERVICE CLIENT LIBRARY
  137. 137. fetch & cache at Cold Start
  138. 138. invalidate at interval + signal
  139. 139. http://bit.ly/2yLUjwd
  140. 140. PRO TIPS
  141. 141. max 75 GB total deployment package size* * limit is per AWS region
  142. 142. Janitor Monkey
  143. 143. Janitor Lambda http://bit.ly/2xzVu4a
  144. 144. disable versionFunctions in
  145. 145. install Serverless framework as dev dependency at project level dev dependencies are excluded since 1.16.0
  146. 146. http://bit.ly/2vzBqhC
  147. 147. http://amzn.to/2vtUkDU
  148. 148. UNDERSTAND COLDSTARTS
  149. 149. Amazon X-Ray 1st invocation 2nd invocation cold start
  150. 150. source: http://bit.ly/2oBEbw2
  151. 151. http://bit.ly/2rtCCBz
  152. 152. C# http://bit.ly/2rtCCBz
  153. 153. Java http://bit.ly/2rtCCBz
  154. 154. NodeJs, Python http://bit.ly/2rtCCBz
  155. 155. AVOID COLDSTARTS
  156. 156. CloudWatch Event AWS Lambda
  157. 157. CloudWatch Event AWS Lambda ping ping ping ping
  158. 158. CloudWatch Event AWS Lambda ping ping ping ping
  159. 159. CloudWatch Event AWS Lambda ping ping ping ping HEALTH CHECKS?
  160. 160. AVOID HARD ASSUMPTIONS ABOUT FUNCTION LIFETIME
  161. 161. USE STATE FOR OPTIMISATION
  162. 162. max 5 mins execution time
  163. 163. USE RECURSION FOR LONG RUNNING TASKS
  164. 164. @theburningmonk theburningmonk.com github.com/theburningmonk
  165. 165. @theburningmonk theburningmonk.com github.com/theburningmonk http://bit.ly/2yQZj1H all my blog posts on Lambda

×