Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cost Optimization at Scale


Published on

Intended for customers who have (or will have) thousands of instances on AWS, this session is about reducing the complexity of managing costs for these large fleets so they run efficiently. Attendees will learn about common roadblocks that prevent large customers from cost optimizing, tools they can use to efficiently remove those roadblocks, and techniques to monitor their rate of cost optimization. The session will include a case study that will talk in detail about the millions of dollars saved using these techniques. Customers will learn about a range of templates they can use to quickly implement these techniques, and also partners who can help them implement these templates.

Published in: Technology

Cost Optimization at Scale

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Keith Jarrett AWS Business Development Manager Cloud Economics July 13, 2016 Cost Optimization at Scale: Building and Realizing the Economic Case for the AWS Cloud
  2. 2. What to expect…. We will introduce our approach for building the business case for moving to the cloud and share tips from some of our most innovative customers who are able to successfully architect for cost optimization in order to realize the economics of the AWS Cloud.
  3. 3. In the beginning . . . …there was TCO
  4. 4. What is TCO? Definition: Comparative total cost of ownership analysis (acquisition and operating costs) for running an infrastructure environment end-to-end on-premises vs. on AWS. Used for: 1) Comparing the costs of running an entire infrastructure environment or specific workload on-premises or in a co-location facility vs. on AWS 2) Budgeting and building the business case for moving to AWS
  5. 5. So how do we do it? ≠
  6. 6. TCO = acquisition costs + operations costs Hardware—server, rack chassis PDUs, Tor switches (+maintenance) Software—OS, virtualization licenses (+maintenance) Facilities cost Hardware—storage disks, SAN/FC switches Storage admin costs Network hardware—LAN switches, load balancer bandwidth costs Network admin costs Server admin virtualization admin4 The diagram doesn’t include every cost item. For example, software costs can include database, management, and middle-tier software costs. Facilities cost can include costs associated with upgrades, maintenance, building security, taxes, and so on. IT labor costs can include security admin and application admin costs. Space Power Cooling Facilities cost Space Power Cooling Facilities cost Space Power Cooling Server costs Storage costs Network costs IT labor costs 1 2 3 illustrative
  7. 7. Resources to get you started AWS TCO Calculator Case studies and research
  8. 8. What’s missing from a traditional TCO? Economic Criteria Category Infrastructure Comparison Capacity Planning Benefits Financial Benefits of Innovation Cost Avoidance Workforce Productivity Accelerated Time To Value/Market Cost to Achieve (Migration, Platform, Training) Legacy Constraints Included Partially Included Not Included
  9. 9. Lowering TCO Through cost optimization On- Premises Lift & Shift Instance Right- Sizing Improved Elasticity Measure, Monitor, Improve Optimized EC2 Storage Optimization Serverless Architecture Managed Services True AWS Optimized Traditional TCO Comparisons
  10. 10. So you’re feeling pretty good.
  11. 11. Until your CFO shows up with the bill.
  12. 12. Cost optimization is… going from… to… pay for what you use pay for what you need
  13. 13. Where do you start?
  14. 14. The four pillars of cost optimization Right-sizing Reserved Instances Increase elasticity Measure, monitor, and improve
  15. 15. Right-sizing Right-sizing • Selecting the cheapest instance available while meeting performance requirements • Looking at CPU, RAM, storage, and network utilization to identify potential instances that can be downsized • Leveraging Amazon CloudWatch metrics and setting up custom RAM metrics Rule of thumb: Right size, then reserve. (But if you’re in a pinch, reserve first.)
  16. 16. Reserved Instances Commitment level 1 year 3 year AWS services offering RIs Amazon EC2 Amazon RDS Amazon DynamoDB Amazon Redshift Amazon ElastiCache * Dependent on specific AWS service, size/type, and region
  17. 17. Reserved Instances Step 1: RI Coverage • Cover always-on resources. Step 2: RI Utilization • Leverage RI flexibility to increase utilization. • Merge and split RIs as needed. Rule of thumb: Target 70–80% always-on coverage and 95% RI utilization rate.
  18. 18. Increase elasticity Turn off nonproduction instances • Look for dev/test, nonproduction instances that are running always-on and turn them off. Autoscale production • Use Auto Scaling to scale up and down based on demand and usage (for example, spikes). Rule of thumb: Shoot for 20–30% of Amazon EC2 instances running on demand to be able to handle elasticity needs.
  19. 19. Using right-sizing and elasticity to lower cost More smaller instances vs. fewer larger instances 29 m4.large @ $0.12 /hr $2,505.60 / mo* 59 t2.medium @ $0.052/hr $2,208.96 / mo* *Assumes Linux instances in the US-East (N. Virginia) Region at 720 hours per month
  20. 20. Putting it all together: case study
  21. 21. Challenge: Minimizing unit costs during a period of massive growth. A consistent measure of CPU processing power Elastic compute unit (ECU)
  22. 22. The growth challenge August 2014 August 2015 584 ECU 1,192 ECU 2x YoY Compute Growth 33% decrease in monthly EC2 costs!
  23. 23. Solving the growth challenge
  24. 24. Step 1: Right-size and update instances m1 on demand $0.07 per ECU c4 on demand $0.02 per ECU
  25. 25. The impact of right-sizing 70% reduction in unit cost
  26. 26. Step 2: Reserve
  27. 27. The impact of reservations 30% reduction In unit cost
  28. 28. Putting it together 85% reduction in unit cost!
  29. 29. Sounds pretty easy, right? Not really. In reality, it is very complex. • Scale • Behavioral change • Visibility • Ownership
  30. 30. Cost optimization governance (Remember the fourth pillar?)
  31. 31. Uncovering the cost optimization opportunities 1. Auto-tag resources. 2. Identify always-on nonprod. 3. Identify instances to downsize. 4. Recommend RIs to purchase. 5. Dashboard our status. 6. Report on savings.
  32. 32. AWS options
  33. 33. Reserved Instances and right-sizing options
  34. 34. Example: reasonable optimization dashboard
  35. 35. Creating a culture of cost transparency Targets and metrics Cloud Competency Center AWS Enterprise Support
  36. 36. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Eric Shoemaker, Vice President Global Sales July 13, 2016 CloudHealth® Technologies Cloud Service Management Building and Realizing an Economic Case for the Cloud
  37. 37. Yesterday Tried & true processes Continuity of Operations Optimal Processes Sustainable Costs Proper Technology
  38. 38. Today What’s changed? Infrastructure ownership increasingly distributed to individual business groups • Increasing control over infrastructure • Go “rogue” to achieve business agility • Not always taking into account importance of governance, compliance, risk management IT increasingly influencer / auditor instead of owner
  39. 39. Active policy management and automation Implement Governance and Automation Define policies to manage cost, usage, performance, and security to optimize and empower LOBs through centralized governance Improve Reservation Management Model, manage, and modify reservations, and customize recommendations based on budget. Optimize Performance Right-sizing Leverage integrated data sources, define policies, and improve resource utilization by workload Simplify Management Establish guidelines, automate tasks, and provide recommendations
  40. 40. Real-world example The devil is in the details Challenge: How to manage and optimize infrastructure performance for thousands of instances and more than 10 PB of Amazon S3 storage that supports a global real estate marketplace? Needs: • Consolidated view across all infrastructure • Eliminate zombie and underutilized assets • Usage analysis tied to business groups • Performance analysis by workload, department, and business groups • More easily leverage reservations
  41. 41. Solution: CloudHealth Cost Summary, RI Management and Health Check Pulse reports, and performance right-sizing paired with integrated Datadog metrics for recommendations on where to downgrade and/or optimize infrastructure. Solution and results More than 30% improved performance Benefits realized from EC2 right-sizing!
  42. 42. Real-world example What’s in your cloud…$$$$ Challenge: How to manage extremely dynamic infrastructure with hundreds of EC2 instances and better leverage reservations? Needs: • Visibility across an extremely dynamic and complex infrastructure • Track and report resource utilization by business groups • Right-size underutilized instances • Model, modify, and purchase reservations to improve utilization
  43. 43. Solution: CloudHealth delivered visibility across all with functional business group reporting that aligned infrastructure costs with usage and performance. The analysis simplified reporting on reservation utilization and provided recommendations for reservation modifications and purchases. Solution and results More than 150 employee hours & $75,000 Benefits realized from RI modifications alone!
  44. 44. Real-world example Automation to simplifying cloud service management Challenge: How to promote agility across individual business groups while also implementing centralized governance? Needs: • More efficiently utilize resources • Terminate unused volumes • Create rules to evaluate criteria for cost savings opportunities and reservation modifications • Minimize infrastructure usage during non-working hours • Improve infrastructure deployment compliance - names and tags
  45. 45. Solution: Implemented CloudHealth Automated Tasks via policies for notification, termination, and start/stop of non-compliant infrastructure deployment, budget overages, weekend usage lights-out rules, and reservation modifications. Solution and results More than 21% The amount saved in the last year!
  46. 46. Managing the cloud for improved efficiency  Visibility from integrated reporting  Optimization through integrated recommendations  Governance with active policy management
  47. 47. Simplifying Cloud Migration
  48. 48. Learn more about how Cloud Service Management can help you! Download our free eBook: Cloud Service Management For Smart People Stop by Booth #615
  49. 49. Putting it all together
  50. 50. Where to start Set up a Cloud Competency Center Bring in the right tools Use metrics to reinforce behavior Use partners to accelerate!
  51. 51. Cycle of cost optimization ✔ ✔ ✔ ✔✘ ✘ ✘ ✘ $ $ $ $ $
  52. 52. Thank You!