Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

4,333 views

Published on

It’s 4am and you don’t know it, but you're about to get three times the traffic you were expecting. Is your service ready to handle it? Systems are only as scalable as their weakest component. Large scale load testing in production is the best (and surest) way to ensure that services can truly scale to the unexpected. But the load generator itself can be difficult to scale, expensive to run on hundreds or thousands of hosts, challenging to keep the data secure, and time consuming to develop. The Amazon.com retail site is one of most heavily used sites in the world, and has to be ready for anything, at anytime. How do you design a load test for this in record time while keeping it cost effective? Well, you use AWS! Come learn Best Practices on how you can use Amazon SQS, Amazon S3, Amazon EC2, Amazon CloudWatch, Auto Scaling, and Amazon DynamoDB to design horizontally scalable large-scale load tests that can simulate the load that millions of users are putting onto your site. We met a tight schedule and did it under budget thanks to AWS and you can too!

Published in: Technology, Business
  • Be the first to comment

Large Scale Load Testing Amazon.com’s Traffic on AWS (CPN102) | AWS re:Invent 2013

  1. 1. Large-Scale Load Testing Amazon.com’s Traffic on AWS Carlos Arguelles, Amazon.com November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. What I’d like you to get out of this Load and performance issues cost
  3. 3. What I’d like you to get out of this
  4. 4. What I’d like you to get out of this How you can leverage AWS for load and stress tests
  5. 5. About me
  6. 6. Amazon.com retail site Amazon.com receives a LOT of traffic
  7. 7. Amazon.com retail site Significant fluctuation throughout the day (not to scale)
  8. 8. Amazon.com retail site Significant fluctuation throughout the year (not to scale)
  9. 9. Amazon.com retail site Significant growth year to year (not to scale)
  10. 10. Some load-related issues can
  11. 11. 1st test (cancelled) 100% 85% 2nd test (successful) 50% regular day (off-peak) CPU Usage on our fleet
  12. 12. Some load-related issues can only
  13. 13. Ingestion Fleet Amazon S3 Hadoop Amazon Database DynamoDB Output Fleet
  14. 14. Some load-related issues cannot
  15. 15. Start load… 20% 5% Disk Usage 5 hours
  16. 16. What do you really want to do? Performance Testing Load Testing Resilience Testing Stress Testing
  17. 17. Load Testing
  18. 18. Stress Testing
  19. 19. Resilience Testing
  20. 20. Performance Testing
  21. 21. How does AWS help us?
  22. 22. Generating load Replays from real-world traffic Artificial rate, blend of operations
  23. 23. Most useful AWS design pattern, ever
  24. 24. Distributing load, the hard way Slave Master 4000 3000 TPS Slave 4000 3000 TPS Slave 4000 3000 TPS Slave 0 TPS 3000 TPS 12,000 TPS
  25. 25. Distributing load, the easy way Controller Controller Job Job Job Job Job Job Job Worker Worker Worker Worker Worker Worker Worker
  26. 26. Replaying traffic to generate load Test Data Repository Controller Controller Metrics & Dashboards Job Job Job Job Job Job Job Worker Worker Worker Worker Worker Worker Worker Service under test
  27. 27. Amazon S3 for storing data Amazon DynamoDB for indexing Test Data Repository Controller Controller Reactive auto scaling based on queue size Job Job Job Job Job Job Job Amazon SQS for state, resilience Amazon CloudWatch Metrics & Dashboards Worker Worker Worker Worker Worker Worker Worker Amazon EC2 & Auto Scaling for hardware
  28. 28. Generating load Replays from real-world traffic Artificial rate, blend of operations
  29. 29. Artificial traffic to generate load • Why? – You do not have real-world data – You expect a change in traffic • How? – Control rate – Control blend – Control duration
  30. 30. Artificial traffic to generate load 50,000 TPS for 20 minutes Minute#1: 50,000 TPS, 99% 1% 99% Read, 1% Writes Minute#20: 50,000 TPS, 99% 1% 85,000 TPS for 45 minutes 90% Read, 10% Writes Minute#1 … 10 TPS for 1 minute, 99% R 1% W 10 TPS for 1 minute, 99% R 1% W 1 2 … 95,000 TPS for 3 hours 80% Read, 20% Writes 10 TPS for 1 minute, 99% R 1% W 5000
  31. 31. Artificial traffic to generate load Controller Controller Job Job Job Job Job Job Job Worker Worker Worker Worker Worker Worker Worker
  32. 32. Amazon EC2 Spot Instances • A great way to inexpensively test – Up to 90% off regular price (name your price) – Interruption-tolerant, time-flexible tasks • Approaches – Combine with on-demand instances (burst) – Try Spot Instances first, then fallback to on-demand
  33. 33. Takeaways
  34. 34. Please give us your feedback on this presentation CPN102 As a thank you, we will select prize winners daily for completed surveys!

×