Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Serverless in production, an experience report (IWOMM)

121 views

Published on

AWS Lambda has changed the way we deploy and run software, but this new serverless paradigm has created new challenges to old problems - how do you test a cloud-hosted function locally? How do you monitor them? What about logging and config management? And how do we start migrating from existing architectures?

In this talk Yan and Domas will discuss solutions to these challenges by drawing from real-world experience running Lambda in production and migrating from an existing monolithic architecture.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Serverless in production, an experience report (IWOMM)

  1. 1. in production an experience reportan experience report what you should know before you go to production ServerlessServerless
  2. 2. Yan Cui http://theburningmonk.com @theburningmonk Domas Lasauskas
  3. 3. apr, 2016
  4. 4. hey guys, vote on this post and I’ll announce a winner at 10PM tonight
  5. 5. 10PM traffic
  6. 6. 10PM traffic 70-100x
  7. 7. low utilisation to leave room for spikes EC2 scaling is slow, so scale earlier
  8. 8. lots of $$$ for unused resources
  9. 9. up to 30 mins for deployment deployment required downtime
  10. 10. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  11. 11. “what would good look like for us?”
  12. 12. be small be fast have zero downtime have no lock-step DEPLOYMENTS SHOULD...
  13. 13. FEATURES SHOULD... be deployable independently be loosely-coupled
  14. 14. WE WANT TO... minimise cost for unused resources minimise ops effort reduce tech mess deliver visible improvements faster
  15. 15. nov, 2016
  16. 16. 170 Lambda functions in prod 1.2 GB deployment packages in prod 95% cost saving vs EC2 15x no. of prod releases per month
  17. 17. time is a good fit
  18. 18. 1st function in prod! time is a good fit
  19. 19. ? time is a good fit 1st function in prod!
  20. 20. ALERTING CI / CD TESTING LOGGING MONITORING
  21. 21. Practices ToolsPrinciples what is good? how to make it good? with what?
  22. 22. Principles outlast Tools
  23. 23. 170 functions ? ? time is a good fit 1st function in prod!
  24. 24. SECURITY DISTRIBUTED TRACING CONFIG MANAGEMENT
  25. 25. rebuilt search
  26. 26. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearch
  27. 27. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  28. 28. new analytics pipeline
  29. 29. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery
  30. 30. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery 1 developer, 2 days design production (his 1st serverless project)
  31. 31. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery “nothing ever got done this fast at Skype!” - Chris Twamley
  32. 32. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  33. 33. Rebuilt with Lambda
  34. 34. nov, 2016
  35. 35. evolving the PLATFORM
  36. 36. rebuilt search
  37. 37. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearch
  38. 38. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  39. 39. new analytics pipeline
  40. 40. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery
  41. 41. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery 1 developer, 2 days design production (his 1st serverless project)
  42. 42. Legacy Monolith Amazon Kinesis Amazon Lambda Google BigQuery “nothing ever got done this fast at Skype!” - Chris Twamley
  43. 43. - Dan North “lead time to someone saying thank you is the only reputation metric that matters.”
  44. 44. Rebuilt with Lambda
  45. 45. Expensive operations SNS retries for “free”
  46. 46. nov, 2016
  47. 47. nov, 2016 Decouple using events Small iterative features Proxy old endpoints Focus on delivering value Leverage cloud services recap
  48. 48. getting PRODUCTION READY
  49. 49. choose a tried-and-tested deployment framework, don’t invent your own
  50. 50. http://serverless.com
  51. 51. https://github.com/awslabs/serverless-application-model
  52. 52. http://apex.run
  53. 53. https://apex.github.io/up
  54. 54. https://github.com/claudiajs/claudia
  55. 55. https://github.com/Miserlou/Zappa
  56. 56. http://gosparta.io/
  57. 57. TESTING
  58. 58. amzn.to/29Lxuzu
  59. 59. Level of Testing 1.Unit do our objects do the right thing? are they easy to work with?
  60. 60. Level of Testing 1.Unit 2.Integration does our code work against code we can’t change?
  61. 61. handler
  62. 62. handler test by invoking the handler
  63. 63. Level of Testing 1.Unit 2.Integration 3.Acceptance does the whole system work?
  64. 64. Level of Testing unit integration acceptance feedback confidence
  65. 65. Don’t Mock Types You Can’t Change
  66. 66. Don’t Mock Types You Can’t Change Services
  67. 67. Paul Johnston The serverless approach to testing is different and may actually be easier. http://bit.ly/2t5viwK
  68. 68. LambdaAPI Gateway DynamoDB
  69. 69. LambdaAPI Gateway DynamoDB Unit Tests
  70. 70. LambdaAPI Gateway DynamoDB Unit Tests Mock/Stub
  71. 71. is our request correct? is the request mapping set up correctly?is the API resources configured correctly? are we assuming the correct schema? LambdaAPI Gateway DynamoDB is Lambda proxy configured correctly? is IAM policy set up correctly? is the table created? what unit tests will not tell you…
  72. 72. most Lambda functions are simple have single purpose, the risk of shipping broken software has largely shifted to how they integrate with external services observation
  73. 73. optimize towards shipping working software, even if it means slowing down your feedback loop…
  74. 74. learning the wrong thing faster does not help us deliver working software faster
  75. 75. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  76. 76. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input
  77. 77. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input Validate
  78. 78. integration tests exercise system’s Integration with its external dependencies my code
  79. 79. acceptance tests exercise system End-to-End from the outside my code
  80. 80. integration tests differ from acceptance tests only in HOW the Lambda functions are invoked observation
  81. 81. CI + CD PIPELINE
  82. 82. end-to-end tests exercise both the system and the process by which it’s built and deployed …has to be done anyway repeatedly during the software’s lifetime…
  83. 83. Yan the earlier you consider CI/CD the more time you save in the long run
  84. 84. Yan deployment scripts that only live on the CI box is a disaster waiting to happen…
  85. 85. Jenkins build config deploys and tests unit + integration tests deploy acceptance tests
  86. 86. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi
  87. 87. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi install Serverless Framework as dev dependency
  88. 88. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi install Serverless Framework as dev dependency mitigate version conflicts
  89. 89. build.sh allows repeatable
 builds on both local & CI or NPM script, or Gradle, or …
  90. 90. Auto Auto Manual
  91. 91. nov, 2016 Automate early Reproducible locally & on CI Use version control Release process to fit your team recap
  92. 92. LOGGING
  93. 93. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?
  94. 94. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now? UTC Timestamp API Gateway Request Id your log message
  95. 95. CloudWatch Logs are too basic
  96. 96. …but you can stream them somewhere else CloudWatch Logs are too basic
  97. 97. CloudWatch Logs AWS Lambda ELK stack
  98. 98.
  99. 99. AWS CloudTrail
 events on resource operations CloudWatch Events
  100. 100. Serverless Framework
  101. 101. DISTRIBUTED TRACING
  102. 102. a user my followers didn’t receive my new post!
  103. 103. where could the problem be?
  104. 104. correlation IDs* * eg. request-id, user-id, yubl-id, etc.
  105. 105. wrap HTTP client & AWS SDK clients to forward captured correlation IDs
  106. 106. kinesis client http client sns client
  107. 107. use X-Ray for performance tracing
  108. 108. Amazon X-Ray
  109. 109. Amazon X-Ray
  110. 110. X-Ray traces do not span over API Gateway, or async event sources
  111. 111. MONITORING + ALERTING
  112. 112. no place to install agents/daemons
  113. 113. • invocation Count • error Count • latency • throttling • granular to the minute • support custom metrics
  114. 114. • same metrics as CW • better dashboard • support custom metrics https://www.datadoghq.com/blog/monitoring-lambda-functions-datadog/
  115. 115. my code
  116. 116. my code
  117. 117. my code internet internet press button something happens
  118. 118. those extra 10-20ms for sending custom metrics would compound when you have microservices and multiple APIs are called within one slice of user event
  119. 119. Amazon found every 100ms of latency cost them 1% in sales. http://bit.ly/2EXPfbA
  120. 120. no more background processing, other than what the platform provides
  121. 121. console.log(“hydrating yubls from db…”); console.log(“fetching user info from user-api”); console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”); console.log(“MONITORING|1489795335|8|count|yubls-served”); timestamp metric value metric type metric namemetrics logs
  122. 122. CloudWatch Logs AWS Lambda ELK stack logs metrics CloudWatch
  123. 123. don’t forget to setup dashboards & CW alarms
  124. 124. CONFIG MANAGEMENT
  125. 125. design for easy & quick propagation of config changes
  126. 126. me Environment variables make it hard to share configurations across functions.
  127. 127. me Environment variables make it hard to implement fine-grained access to sensitive info.
  128. 128. config service goes here
  129. 129. SSM Parameter Store
  130. 130. sensitive data should be encrypted in-flight, and at-rest
  131. 131. enforce role-based access to sensitive configuration values
  132. 132. SSM Parameter Store HTTPS role-based access encrypted in-flight
  133. 133. SSM Parameter Store encrypt role-based access
  134. 134. SSM Parameter Store encrypted at-rest
  135. 135. HTTPS role-based access SSM Parameter Store encrypted in-flight
  136. 136. invest into a robust client library
  137. 137. fetch & cache at cold-start
  138. 138. invalidate at interval & weak signals
  139. 139. that’s all for now, folks ;-)
  140. 140. API Gateway and Kinesis Authentication & authorisation (IAM, Cognito) Testing Running & Debugging functions locally Log aggregation Monitoring & Alerting X-Ray Correlation IDs CI/CD Performance and Cost optimisation Error Handling Configuration management VPC Security Leading practices (API Gateway, Kinesis, Lambda) Canary deployments http://bit.ly/production-ready-serverless get 40% off with: ytcui
  141. 141. @theburningmonk theburningmonk.com github.com/theburningmonk

×