Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PAC 2019 virtual Stefano Doni

Optimizing cloud configurations for price-performance

  • Be the first to comment

  • Be the first to like this

PAC 2019 virtual Stefano Doni

  1. 1. A needle in the haystack: optimizing cloud configurations for price-performance by Stefano Doni CTO @ akamas.io
  2. 2. The Problem
  3. 3. Cloud compute services offer overwhelming choices EC2 instances cost ranges from $3.4 to $19482 per month (on demand) https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-ec2-instances-performance-optimization-best-practices-cmp307r1- aws-reinvent-2018
  4. 4. Cloud storage services provide various price and performance points https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018 EBS cost ranges from $0.025 to $0.125 per GB-month + provisioned IOPS
  5. 5. Cloud compute instances and storage types are interdependent https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018 EC2 to EBS network can limit actual volume performance (e.g. IOPS) bottleneck!
  6. 6. The current approaches
  7. 7. The model-based approach, aka cloud right sizing recommendations https://cloud.google.com/compute/docs/instances/apply-sizing-recommendations-for-instances
  8. 8. The experimental approach, aka load test your app “There is no substitute for measuring the performance of your entire application, because application performance can be impacted by the underlying infrastructure or by software and architectural limitations. We recommend application-level testing, including the use of application profiling and load testing tools and services” https://aws.amazon.com/ec2/instance-types/
  9. 9. A bigger problem: same specs, different performance across different cloud providers “CockroachDB 2.1 achieves 40% more throughput (tpmC) on TPC-C when tested on AWS using c5d.4xlarge than on GCP via n1-standard-16. We were shocked that AWS offered such superior performance” Cockroach Labs https://www.cockroachlabs.com/blog/2018_cloud_report/
  10. 10. Why current approaches can not assure optimal application performance and low costs? ● May not consider end to end application performance ● May not capture hidden bottlenecks ● May not capture unique application / workload behaviour ● May not factor in cloud-specific platforms and implementations (e.g. hypervisors, CPU architectures) ● Can’t scale to the sheer complexity of cloud options
  11. 11. The new AI-driven approach
  12. 12. Key capabilities Powered by AI Automated Full-stack Goal-driven
  13. 13. A new vision: continuous and self-driving optimization Configure Performance Test Measure Goal
  14. 14. A real example: optimizing MongoDB on AWS
  15. 15. The use case Goal Minimize price/performance of a MongoDB database hosted on AWS Performance is throughput of the database (queries/sec), price is monthly AWS price for the provisioned resources Scenario Akamas driving automated optimization including application load tests Workflow to provision AWS EC2 and EBS resources as suggested by AI engine Optimization scope AWS EC2 instances and EBS storage volumes powering MongoDB
  16. 16. Modeling the cloud cost-optimization problem c5d.2xlarge Instance family Instance generation Additional capabilities Volume type Instance size Volume size Volume IOPS io1 70 GB 1000 IOPS EC2 EBS
  17. 17. Results
  18. 18. AI-driven price-performance optimization results Baseline configuration: price/performance of r4.large, gp2 70GB Best configuration: -68% price/performance after 18 experiments or approx 22 hours
  19. 19. Best configuration: for the same price, 3x throughput and - 90% latency Price: - 2.9% 65.52 (best) vs 67.48 (baseline) €/month Throughput: +205% 7605 (best) vs 2493 (baseline) query/sec Latency (avg): -90% 1330 (best) vs 14575 (baseline) milliseconds
  20. 20. How did AI achieve that? A look at the best configuration Instance Name Use cases vCPUs Memory (GiB) Instance Storage Block Storage (EBS) r4.large (baseline) Memory optimized 2 x Intel Xeon E5- 2686 15.25 - gp2 70GB m5d.large (best) General purpose 2 x Custom Intel Xeon Platinum 8175M 8 1 x 150 GB NVMe SSD n/a The best configuration for this workload is: m5d.large HW specs comparison
  21. 21. AI can find unusual configurations: AMD CPUs with half memory can cut costs and still improve throughput The cheapest configuration for this workload is m5a.large -24% cost with +12% throughput Instance Name Use cases vCPUs Memory (GiB) Instance Storage Block Storage (EBS) r4.large (baseline) Memory optimized 2 x Intel Xeon E5-2686 15.25 - gp2 70 GB m5a.large (cheapest) Memory optimized 2 x AMD EPYC 8 - gp2 114 GB HW specs comparison Searching instances with EBS storage Top 5 best configurations
  22. 22. r4.large m5a.large Memoryused r4.large m5a.large ThroughputDebunking a common myth: high resource usage != application performance bottleneck … despite m5a.large (cheapest) having half the memory of r4.large (baseline) Throughput +12% higher for the m5a.large (cheapest) vs r4.large (baseline) instance ...
  23. 23. Conclusions
  24. 24. Takeaways ● Technology landscape is becoming more and more complex ● Traditional approaches are not effective and can’t scale - significant optimization opportunities are left on the table ● AI for IT optimization is required and can reach previously unthinkable benefits, beyond what human experts can do ● In the cloud, 70% price/performance improvements are possible by properly exploiting choices we have ● Cloud rightsizing recommendations may suggest higher price options
  25. 25. Q & A

×