Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×
 

Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013

on

  • 838 views

This session dives deep into techniques used by successful customers who optimized their use of AWS. Learn tricks and hear tips you can implement right away to reduce waste, choose the most efficient ...

This session dives deep into techniques used by successful customers who optimized their use of AWS. Learn tricks and hear tips you can implement right away to reduce waste, choose the most efficient instance, and fine-tune your spending, often with improved performance and a better end-customer experience. We showcase innovative approaches and demonstrate easily-applicable methods for cost optimizing Amazon EC2, Amazon S3, and a host of other services to save you time and money.

Statistics

Views

Total Views
838
Views on SlideShare
838
Embed Views
0

Actions

Likes
2
Downloads
38
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013 Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013 Presentation Transcript

    • CPN211 - Reducing Cost and Maximizing Efficiency: Tightening the Belt on AWS Tom Johnston - Business Development Manager, Amazon Web Services Sean Simpson - Director of Operations, Stitcher, Inc. Kingsley Wood - Business Development Manager, Amazon Web Services Ashay Padwal - CTO, Vserv.mobi November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
    • Introductions and Outline • Tom Johnston (AWS) Reducing Cost and Spending Smart • Sean Simpson (Stitcher) Moving to AWS – A Story • Kingsley Wood (AWS) Maximizing Efficiency and Cost Optimization • Ashay Padwal (vServ.mobi) a Spot Case Study
    • Reducing Cost and Spending Smart Tom Johnston – Business Development Manager, AWS
    • Fundamentals • • • • • Explicit Objectives Match Instances with Workloads Match Scale & Use with Demand Match Purchasing with Utilization Governance Matters
    • Objectives
    • Objectives AWS provides you the ability to match your architecture to your objectives
    • Instance types Start Choose an instance that best meets your basic requirements Match memory & virtual cores
    • Instance types Start Tune Choose an instance that best meets your basic requirements Change instance size up or down based upon monitoring Match memory & virtual cores Use CloudWatch & Trusted Advisor to assess
    • Know your usage Instance Free Memory Free CPU Free HDD … Custom Metrics … At 1-min intervals PUT 2 weeks Amazon CloudWatch Alarm
    • More Memory Memory (GB) High-Mem Cluster Compute High Storage High I/O High Mem Cluster Compute M3 C3 M1 High-CPU Processing Ability More Processing
    • Instance types Start Tune Roll-Out Choose an instance that best meets your basic requirements Change instance size up or down based upon monitoring Run multiple instances in multiple Availability Zones Match memory & virtual cores Use CloudWatch & Trusted Advisor to assess
    • Choose your metric optimize for the metric
    • Choose your metric optimize for the metric Cost per unit of work per instance(size) Workload A Workload B Workload C Optimal on 4x m1.xlarge Optimal on 10x m1.medium Optimal on 2x m3.xxlarge
    • Choose your metric optimize for the metric Cost per unit of work per instance (size) 100 concurrent jobs on 10 x m1.large @ $0.26 / hr = $ 0.026 / job vs 300 concurrent jobs on 10 x m3.xlarge @ $0.58 / hr = $ 0.019 / job
    • Choose your metric optimize for the metric Think workload density Don’t just focus on instance hourly rate
    • Server Load 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • Server Load Capacity of 1 Server 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • Server Load Traditional capacity required Capacity of 1 Server 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • Server Load Traditional capacity required Capacity of 1 Server 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • Server Load Traditional capacity required Capacity of 1 Server 1 Server for 8 hours 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • Traditional capacity required Server Load 1 Server for 8 hours Capacity of 1 Server 1 Server for 8 hours 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • Traditional capacity required Server Load 1 Server for 8 hours Capacity of 1 Server 1 Server for 8 hours 1 Server for 8 hours 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • Server Load Traditional capacity required Capacity of 1 Server 1/3rd Saving 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
    • 6 Instance Count 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
    • 6 Instance Count 5 Monthly predictable peak processing 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
    • Traditional capacity required 6 Instance Count 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
    • Traditional capacity required 6 Instance Count 5 4 3 2 1 Elastic Capacity 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
    • Traditional capacity required 6 Instance Count 5 4 75% Savings 3 2 1 Elastic Capacity 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
    • Reserved instances On-demand instances Unix/Linux instances start at $0.02/hour Pay as you go for compute power Low cost and flexibility Pay only for what you use, no up-front commitments or long-term contracts Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing
    • Reserved instances On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery
    • Reserved instances Heavy utilization RI On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Up to 58% Savings
    • Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline
    • Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI Up to 49% Savings
    • Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI 41-79% utilization Lower costs up to 49% Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Use Cases: Web applications, many heavy processing tasks, running much of the time
    • Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI 41-79% utilization Lower costs up to 49% Use Cases: Web applications, many heavy processing tasks, running much of the time Use Cases: Light utilization RI Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Up to 34% Savings
    • Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI 41-79% utilization Lower costs up to 49% Use Cases: Web applications, many heavy processing tasks, running much of the time Use Cases: Light utilization RI Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery 15-40% utilization Lower costs up to 34% Use Cases: Disaster Recovery, Weekly / Monthly reporting, Elastic Map Reduce
    • Best RI for Utilization $18,000 $16,000 $14,000 $12,000 $10,000 $8,000 Heavy Medium Light $6,000 $4,000 $2,000 $- O-Demand
    • Best RI for Utilisation $18,000 $16,000 $14,000 $12,000 $10,000 $8,000 Heavy Medium Light $6,000 $4,000 $2,000 $- O-Demand
    • Optimizing costs with RIs 14 12 On Demand 10 Light Utilization RI 8 Medium Utilization RI 6 Heavy utilization RI 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
    • Spot instances On-demand instances Reserved instances Spot instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Bid on unused EC2 capacity Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Spot Price based on supply/demand, determined automatically Low cost and flexibility Low Cost / Predictability Cost / Large Scale, dynamic workload handling Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Use Cases: Applications with flexible start and end times Applications with steady state or predictable usage Applications only feasible at very low compute prices Applications that require reserved capacity, including disaster recovery
    • Governance Matters • Who can create and launch instances? • Who checks that only needed instances are running? • Have specific policies • Use AWS tools such as IAM to help enforce them
    • Checklist • • • • • • Identify your goals Understand your workload & match to instances Scale up and down with demand Align purchasing methods & utilization Have governance appropriate to your goals Change in goals & workload will drive change in use of AWS
    • Moving to AWS – A Story Sean Simpson Director of Operations - Stitcher, Inc.
    • What is Stitcher? • Stitcher is to news and talk radio what Pandora is to music • Stitcher is a content aggregator • Stitcher is an on-demand service • Stitcher is deployed on mobile, CE, and automotive platforms
    • Stitcher by the Numbers • • • • 12 million downloads 20,000+ shows Over 1 million hours of listening weekly Over 100 TB outbound data monthly
    • With Growth Comes Pain • DRBD database locked us into hardware • Sublease of colocation facility restricted our access to our servers • Server leases and purchases constrained our architecture • Growth inhibited by human, server, and vendor resources
    • What options did we consider? • Move to another colocation facility • Move to a cloud provider • Move to a hybrid colocation/cloud provider
    • Why we chose Amazon Web Services • Familiarity – Already using Amazon Simple Storage Service for our RSS feeds – Already experimenting with Amazon Elastic Compute Cloud – Recently implemented Amazon Simple Queue Service
    • Why we chose Amazon Web Services • Flexibility / Scalability – Ability to adjust resources quickly in our production environment – Ability to create any number of environments – Ability to design servers as we wanted with respect to operating systems, systems software, etc.
    • Why we chose Amazon Web Services • Cost – – – – Cost matches usage Bandwidth savings when using Amazon CloudFront as our CDN Many resources to assist in optimization Put simply, we got our solution for the lowest quote
    • Why we chose Amazon Web Services • Documentation & Customer Service – Knowledgeable solutions architects – “Right-level” documentation – Quick response to our needs
    • Architecting Change • Ask yourself: What are we trying to achieve? • Know yourself, know your systems • Consider industry best practices (but don’t blindly follow them) • Read the documentation
    • Use Puppet or Chef • Configuration management tools are both enabling and liberating • Build, destroy, and build again • Write once, build many • Nuances between node types are managed with clearly written rules • Naming conventions are your friend
    • Our Architecture
    • Looks nice, but what does it do? • • • • • High Availability Scalability Security Performance Cost effectiveness
    • The Results – Database connections/sec Before 225 After 450 0 100 200 300 400 500
    • The Results – GetStationPlaylist() Before 0.75 After 0.1 0 0.2 0.4 0.6 0.8
    • The Results – Maximum throughput Before 5000 After 20000 0 5000 10000 15000 20000 25000
    • The Results – Downtime Before 1200 15 After 0 200 400 600 800 1000 1200 1400
    • Cost Optimization Results • Twice the results for the same money
    • How we save money • • • • • • • Reserved instances Appropriate instance types CloudFront CDN Rapid reorganization using the API Monitor utilization Load test Housecleaning
    • On Deck Cost Savings • • • • Spot instances for processing tasks Auto Scaling In-app optimizations Instance type tuning
    • Parting Advice • Architect for 10X • Take the time to get it right the first time (or at least, close enough) • Plan on continuous evolution of systems
    • Maximizing Efficiency and Cost Optimization Kingsley Wood – Business Development Manager, AWS
    • Considerations • • • • • • Offloading – reduce footprint Utilization – your biggest lever Managed Services – leverage RDS, SQS, SES Consolidated Billing – pooling resources Flexible Evolution – continually revisit Spot Instances – think big, new possibilities
    • OFFLOAD all static content • reduce your compute demand and costs • improve end-user experience • increase reliability and durability +
    • ENTIRE SITE via CloudFront • minimize client-server chatter (keep it at the edge) • reduce server-database traffic (cache the common calls) • speed up mobile app response (persistent connections) +
    • Real World Example Standard Setup Optimized • 4 x Medium Instances $485 • AWS Data Transfer 1 TB $194 • 1 x Medium Instance $121 • CloudFront Data 1 TB $168 • CloudFront Requests $1.89 • Total = $291 • Total = $679 57% Lower Cost + 6X Faster
    • Offloading Tips • Leverage S3, CloudFront, Route 53 • Eliminate repeated calls (edge and data cache) • Static website hosting on S3 No web server at all! • Minimize your EC2 and database footprint stand up Read Replicas for variable loads
    • Utilization and Auto-Scaling: Granularity more small instances vs. less large instances 29 Large @ $0.32/hr = $9.28 59 Small @ $0.08/hr = $4.72
    • Utilization – Trigger Actions by Event Leverage CloudWatch to collect and measure metrics
    • Buuuk for Singapore Press Holdings (SPH)
    • The Straits Times Mobile App REAL-TIME reaction response • • • • notification of pending News Flash (with audible alarm) on-demand ramp up of capacity (6 mins) subscriber alert push delivered mass response traffic handled (followed by ramp down)
    • Architecture Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available systems in the cloud. These qualities have been designed into our services both by handling such aspects without any special action by you and by providing features that must be used explicitly and correctly.
    • Managed Services Reduce:
    • Managed Services Amazon Relational Database Service (RDS) Amazon ElastiCache Amazon Simple Queue Service (SQS) Elastic Load Balancing Amazon Elastic MapReduce Amazon Simple Email Service (SES) Amazon Simple Notification Service (SNS)
    • $0.028 per hour DNS Elastic Load Balancing Web Servers Availability Zone
    • $0.028 per hour DNS Elastic Load Balancer Web Servers Availability Zone VS $0.08 per hour (small instance) DNS EC2 instance + software LB Web Servers Availability Zone
    • Consumers Producer $0.50 per 1,000,000 Requests ($0.0000005 per Request) SQS queue
    • Consumers Producer SQS queue $0.50 per 1,000,000 Requests ($0.0000005 per Request) VS $0.08 per hour (small instance) Producer EC2 instance + software queue Consumers
    • Consolidated Billing
    • RI Purchases to grow a Resource Pool 35 30 25 E D C B A 20 15 Reserved Instance Pool 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12
    • Tiered Pricing
    • Flexibility: Take advantage! Architecture vs. Gardening STOP/START size changes new instance types vary capacity rearrange, etc.
    • What are Spot Instances? • Value  Pricing • Up to 92% discount  Elastic • Capacity not otherwise available  Minimum Commitment • Commit to 1 hour • Tradeoff  Potential for interruption
    • Key Points about Spot • • • • Spare capacity – supply and demand Be prepared for no availability at times Be willing to accept and deal with interruption Far greater potential scale starting at 5X default instance limits • Massive possible capacity = new ideas…
    • Consider 2 Time-to-Value Scenarios 1) Value of results quickly diminishes 2) Value of result stable until deadline e.g., Engineering simulations e.g., Analytics before an M&A deal
    • Spot Applications Ideal Applications Batch Processing Time-Delayable Fault-Tolerant or Restartable Compute-Intensive Horizontally Scalable Stateless Worker Nodes Region and AZ Independent Uses Deployment Automation Less Ideal Applications Interactive Strict/Tight SLA for Completion Expensive to Handle Terminations Data-Intensive In-Memory Scaling Long-Running Worker Nodes Requires a Single AZ Manually Launched and Managed
    • Spot Advice and Tips • Don’t build your reliability ENTIRELY on spot vServ.mobi – exceptional and smart architecture • With time flexibility, different approaches: delayed results, lower cost spend less, quicker answers • Ask different questions: with enormous capacity, what is now possible?
    • Look at the World Differently • • • • • • Order of magnitude more capacity New experiments enabled = innovation! Lucky Oyster – recommendation exchange Prototyping a new search technology idea (using Common Crawl) 3.4 billion web pages > 1 TB of data > Index of 400 million entities “The cost? About $100... in about 14 hours”
    • A Spot Case Study Ashay Padwal CoFounder & CTO – vServ.mobi
    • GLOBAL INNOVATION FOCUSED Award Winning Mobile Ad Exchange across Emerging Markets
    • 31 Bn Ad Requests / Month 11% EUROPE 11% REST OF ASIA 7% NORTH AMERICA 33% INDIA 10% SOUTH AMERICA 14% MIDDLE EAST & AFRICA 14% SE ASIA Over 200 Mn Unique Users / Month
    • Infrastructure: Requirements & Challenges 1 2 3 4 Requirement: Self Serve for Publisher On-boarding & Exit Challenge: No Capacity Planning; Extreme Scalability Requirement: Start Up Challenge: No Capex, no Lock-in Requirement: Least Latency & High Availability Challenge: Suite of services – Compute, Load Balancing, DNS, CDN, Storage, Multiple DCs per location Requirement: Global Setup management with small team Challenge: Availability across Regions with extensive APIs
    • Infrastructure: Solution 1 AWS 2 AWS 3 EC2 & ELB – Multi-AZ Route53, CloudFront, S3 4 US East, US West, Europe, South America, Asia For Middle East, we host in Turkey For Africa, we host in South Africa
    • Deployment Overview
    • Ad Delivery Setup
    • Now What? Reduce Cost without impacting Performance • AWS is pretty cost-effective. But we were greedy! • Saving more meant more money for other areas in our business. • We walked in the opposite direction... and it worked! • We use spot instances in production extensively. • Sounds risky? - Yes, but if you architect your system correctly, you should be safe.
    • What we did 1 2 Selected the right Instance Type - use CloudWatch for CPU & memory usage - Load Test Designed our servers to be self-sufficient and perishable - 3 Business logic & DB on same server Transaction Logs written to EBS Auto Setup on Server Data Collection module We built a custom Scaling solution - Add/Remove instances by checking present traffic & predicting traffic in the immediate future Based on trending of spot prices either try launching spot or fall back to on-demand instances Remove servers if in use between 45-55min Track spot prices to shift to on-demand
    • What AWS did 1 Reduced pricing for EC2 (On Demand & Reserved) and S3 2 Cheap Archival System - Glacier 3 Pre warming of Load Balancer (ELB) 4 AMI movement across regions 5 ELB with equal distribution of traffic across instances spread in any Availability Zone
    • THANK YOU! Ashay Padwal CTO & Co-Founder ashay@vserv.mobi
    • Closing – Key Takeaways • Re-evaluate, revist and re:Invent Evolve along with AWS • Leverage Managed Services, CloudWatch • Stay up to date RI modifications, Trusted Advisor • AWS Blog: aws.typepad.com
    • Please give us your feedback on this presentation CPN211 As a thank you, we will select prize winners daily for completed surveys!