Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Serverless in Production, an experience report (cloudXchange)

64 views

Published on

AWS Lambda has changed the way we deploy and run software, but this new serverless paradigm has created new challenges to old problems - how do you test a cloud-hosted function locally? How do you monitor them? What about logging and config management? And how do we start migrating from existing architectures?

In this talk Yan and Scott will discuss solutions to these challenges by drawing from real-world experience running Lambda in production and migrating from an existing monolithic architecture.

Published in: Technology
  • Be the first to comment

Serverless in Production, an experience report (cloudXchange)

  1. 1. in production an experience reportan experience report what you should know before you go to production ServerlessServerless
  2. 2. Yan Cui http://theburningmonk.com @theburningmonk
  3. 3. apr, 2016
  4. 4. hey guys, vote on this post and I’ll announce a winner at 10PM tonight
  5. 5. 10PM traffic
  6. 6. 10PM traffic 70-100x
  7. 7. low utilisation to leave room for spikes EC2 scaling is slow, so scale earlier
  8. 8. lots of $$$ for unused resources
  9. 9. up to 30 mins for deployment deployment required downtime
  10. 10. be small be fast have zero downtime have no lock-step DEPLOYMENTS SHOULD...
  11. 11. FEATURES SHOULD... be deployable independently be loosely-coupled
  12. 12. WE WANT TO... minimise cost for unused resources minimise ops effort reduce tech mess deliver visible improvements faster
  13. 13. nov, 2016
  14. 14. 170 Lambda functions in prod 1.2 GB deployment packages in prod 95% cost saving vs EC2 15x no. of prod releases per month
  15. 15. time is a good fit
  16. 16. 1st function in prod! time is a good fit
  17. 17. ? time is a good fit 1st function in prod!
  18. 18. ALERTING CI / CD TESTING LOGGING MONITORING
  19. 19. 170 functions ? ? time is a good fit 1st function in prod!
  20. 20. SECURITY DISTRIBUTED TRACING CONFIG MANAGEMENT
  21. 21. evolving the PLATFORM
  22. 22. rebuilt search
  23. 23. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearch
  24. 24. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  25. 25. Rebuilt with Lambda
  26. 26. getting PRODUCTION READY
  27. 27. choose a tried-and-tested deployment framework, don’t invent your own
  28. 28. http://serverless.com
  29. 29. https://github.com/awslabs/serverless-application-model
  30. 30. TESTING
  31. 31. amzn.to/29Lxuzu
  32. 32. Level of Testing 1.Unit do our objects do the right thing? are they easy to work with?
  33. 33. Level of Testing 1.Unit 2.Integration does our code work against code we can’t change?
  34. 34. handler
  35. 35. handler test by invoking the handler
  36. 36. Level of Testing 1.Unit 2.Integration 3.Acceptance does the whole system work?
  37. 37. Level of Testing unit integration acceptance feedback confidence
  38. 38. “…We find that tests that mock external libraries often need to be complex to get the code into the right state for the functionality we need to exercise. The mess in such tests is telling us that the design isn’t right but, instead of fixing the problem by improving the code, we have to carry the extra complexity in both code and test…” Don’t Mock Types You Can’t Change
  39. 39. “…The second risk is that we have to be sure that the behaviour we stub or mock matches what the external library will actually do… Even if we get it right once, we have to make sure that the tests remain valid when we upgrade the libraries…” Don’t Mock Types You Can’t Change
  40. 40. Don’t Mock Types You Can’t Change Services
  41. 41. Paul Johnston The serverless approach to testing is different and may actually be easier. http://bit.ly/2t5viwK
  42. 42. LambdaAPI Gateway DynamoDB
  43. 43. LambdaAPI Gateway DynamoDB Unit Tests
  44. 44. LambdaAPI Gateway DynamoDB Unit Tests Mock/Stub
  45. 45. is our request correct? is the request mapping set up correctly?is the API resources configured correctly? are we assuming the correct schema? LambdaAPI Gateway DynamoDB is Lambda proxy configured correctly? is IAM policy set up correctly? is the table created? what unit tests will not tell you…
  46. 46. most Lambda functions are simple have single purpose, the risk of shipping broken software has largely shifted to how they integrate with external services observation
  47. 47. optimize towards shipping working software, even if it means slowing down your feedback loop…
  48. 48. “…Wherever possible, an acceptance test should exercise the system end-to- end without directly calling its internal code. An end-to-end test interacts with the system only from the outside: through its interface…” Testing End-to-End
  49. 49. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda
  50. 50. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input
  51. 51. Legacy Monolith Amazon Kinesis Amazon Lambda Amazon CloudSearchAmazon API Gateway Amazon Lambda Test Input Validate
  52. 52. integration tests exercise system’s Integration with its external dependencies my code
  53. 53. acceptance tests exercise system End-to-End from the outside my code
  54. 54. integration tests differ from acceptance tests only in HOW the Lambda functions are invoked observation
  55. 55. CI + CD PIPELINE
  56. 56. me deployment scripts that only live on the CI box is a disaster waiting to happen…
  57. 57. Jenkins build config deploys and tests unit + integration tests deploy acceptance tests
  58. 58. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi
  59. 59. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi install Serverless framework as dev dependency
  60. 60. if [ "$1" = "deploy" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE 'node_modules/.bin/sls' deploy -s $STAGE -r $REGION elif [ "$1" = "int-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run int-$STAGE elif [ "$1" = "acceptance-test" ] && [ $# -eq 4 ]; then STAGE=$2 REGION=$3 PROFILE=$4 npm install AWS_PROFILE=$PROFILE npm run acceptance-$STAGE else usage exit 1 fi install Serverless framework as dev dependency mitigate version conflicts
  61. 61. build.sh allows repeatable builds on both local & CI
  62. 62. Auto Auto Manual
  63. 63. LOGGING
  64. 64. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now?
  65. 65. 2016-07-12T12:24:37.571Z 994f18f9-482b-11e6-8668-53e4eab441ae GOT is off air, what do I do now? UTC Timestamp API Gateway Request Id your log message
  66. 66. me Logs are not easily searchable in CloudWatch Logs.
  67. 67. CloudWatch Logs
  68. 68. CloudWatch Logs AWS Lambda ELK stack
  69. 69. DISTRIBUTED TRACING
  70. 70. a user my followers didn’t receive my new post!
  71. 71. where could the problem be?
  72. 72. correlation IDs* * eg. request-id, user-id, yubl-id, etc.
  73. 73. wrap HTTP client & AWS SDK clients to forward captured correlation IDs
  74. 74. kinesis client http client sns client
  75. 75. Amazon X-Ray
  76. 76. Amazon X-Ray
  77. 77. X-Ray traces do not span over API Gateway, or async event sources
  78. 78. MONITORING + ALERTING
  79. 79. no place to install agents/daemons
  80. 80. • invocation Count • error Count • latency • throttling • granular to the minute • support custom metrics
  81. 81. my code
  82. 82. my code
  83. 83. my code internet internet press button something happens
  84. 84. console.log(“hydrating yubls from db…”); console.log(“fetching user info from user-api”); console.log(“MONITORING|1489795335|27.4|latency|user-api-latency”); console.log(“MONITORING|1489795335|8|count|yubls-served”); timestamp metric value metric type metric namemetrics logs
  85. 85. CloudWatch Logs AWS Lambda ELK stack logs metrics CloudWatch
  86. 86. don’t forget to setup dashboards & CW alarms
  87. 87. CONFIG MANAGEMENT
  88. 88. sensitive data should be encrypted in-flight, and at-rest
  89. 89. enforce role-based access to sensitive configuration values
  90. 90. SSM Parameter Store HTTPS role-based access encrypted in-flight
  91. 91. SSM Parameter Store encrypt role-based access
  92. 92. SSM Parameter Store encrypted at-rest
  93. 93. HTTPS role-based access SSM Parameter Store encrypted in-flight
  94. 94. API Gateway and Kinesis Authentication & authorisation (IAM, Cognito) Testing Running & Debugging functions locally Log aggregation Monitoring & Alerting X-Ray Correlation IDs CI/CD Performance and Cost optimisation Error Handling Configuration management VPC Security Leading practices (API Gateway, Kinesis, Lambda) Canary deployments http://bit.ly/production-ready-serverless get 40% off with: ytcui
  95. 95. @theburningmonk theburningmonk.com github.com/theburningmonk

×