CPN211 - Reducing Cost and Maximizing
Efficiency: Tightening the Belt on AWS
Tom Johnston - Business Development Manager, ...
Introductions and Outline
• Tom Johnston (AWS)
Reducing Cost and Spending Smart
• Sean Simpson (Stitcher)
Moving to AWS – ...
Reducing Cost
and
Spending Smart
Tom Johnston – Business Development Manager, AWS
Fundamentals

•
•
•
•
•

Explicit Objectives
Match Instances with Workloads
Match Scale & Use with Demand
Match Purchasing...
Objectives
Objectives

AWS provides you the ability to
match your architecture to your
objectives
Instance types

Start
Choose an instance
that best meets your
basic requirements
Match memory & virtual
cores
Instance types

Start

Tune

Choose an instance
that best meets your
basic requirements

Change instance size up
or down b...
Know your usage

Instance

Free Memory
Free CPU
Free HDD
…
Custom Metrics
…
At 1-min
intervals

PUT

2 weeks

Amazon
Cloud...
More
Memory
Memory (GB)

High-Mem
Cluster
Compute

High
Storage
High
I/O

High
Mem

Cluster
Compute

M3
C3

M1
High-CPU

P...
Instance types

Start

Tune

Roll-Out

Choose an instance
that best meets your
basic requirements

Change instance size up...
Choose your metric
optimize for the metric
Choose your metric
optimize for the metric
Cost per unit of work per instance(size)
Workload A

Workload B

Workload C

Op...
Choose your metric
optimize for the metric
Cost per unit of work per instance (size)

100 concurrent jobs on 10 x m1.large...
Choose your metric
optimize for the metric
Think workload density
Don’t just focus on instance hourly rate
Server Load
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
Server Load

Capacity of 1 Server

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Hour of day
Server Load

Traditional capacity required

Capacity of 1 Server

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ...
Server Load

Traditional capacity required

Capacity of 1 Server

1 Server for 8 hours

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14...
Server Load

Traditional capacity required

Capacity of 1 Server

1 Server for 8 hours

1 Server for 8 hours

0 1 2 3 4 5 ...
Traditional capacity required

Server Load

1 Server for 8 hours
Capacity of 1 Server

1 Server for 8 hours

1 Server for ...
Traditional capacity required

Server Load

1 Server for 8 hours
Capacity of 1 Server

1 Server for 8 hours

1 Server for ...
Server Load

Traditional capacity required

Capacity of 1 Server

1/3rd
Saving

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1...
6

Instance Count

5
4
3
2
1
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

24

26

28

30
6

Instance Count

5

Monthly
predictable
peak
processing

4
3
2
1
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

...
Traditional capacity required

6

Instance Count

5
4
3
2
1
0
0

2

4

6

8

10

12 14 16 18
Day of Month

20

22

24

26
...
Traditional capacity required

6

Instance Count

5
4
3
2
1

Elastic Capacity
0
0

2

4

6

8

10

12 14 16 18
Day of Mont...
Traditional capacity required

6

Instance Count

5
4

75% Savings

3
2
1

Elastic Capacity
0
0

2

4

6

8

10

12 14 16 ...
Reserved instances

On-demand instances
Unix/Linux instances start at
$0.02/hour
Pay as you go for compute power
Low cost ...
Reserved instances

On-demand instances

Reserved instances

Unix/Linux instances start at
$0.02/hour

1- or 3-year terms
...
Reserved instances

Heavy utilization RI

On-demand instances

Reserved instances

Unix/Linux instances start at
$0.02/hou...
Reserved instances

Heavy utilization RI
> 80% utilization
Lower costs up to 58%

On-demand instances

Reserved instances
...
Reserved instances

Heavy utilization RI
> 80% utilization
Lower costs up to 58%

On-demand instances

Reserved instances
...
Reserved instances

Heavy utilization RI
> 80% utilization
Lower costs up to 58%

On-demand instances

Reserved instances
...
Reserved instances

Heavy utilization RI
> 80% utilization
Lower costs up to 58%

On-demand instances

Reserved instances
...
Reserved instances

Heavy utilization RI
> 80% utilization
Lower costs up to 58%

On-demand instances

Reserved instances
...
Best RI for Utilization
$18,000

$16,000
$14,000
$12,000
$10,000
$8,000

Heavy
Medium
Light

$6,000
$4,000
$2,000
$-

O-De...
Best RI for Utilisation
$18,000

$16,000
$14,000
$12,000
$10,000
$8,000

Heavy
Medium
Light

$6,000
$4,000
$2,000
$-

O-De...
Optimizing costs with RIs
14

12

On Demand
10

Light Utilization RI
8

Medium Utilization RI
6

Heavy utilization RI
4

2...
Spot instances

On-demand instances

Reserved instances

Spot instances

Unix/Linux instances start at
$0.02/hour

1- or 3...
Governance Matters
• Who can create and launch instances?
• Who checks that only needed instances are
running?
• Have spec...
Checklist
•
•
•
•
•
•

Identify your goals
Understand your workload & match to instances
Scale up and down with demand
Ali...
Moving to AWS – A Story
Sean Simpson
Director of Operations - Stitcher, Inc.
What is Stitcher?
• Stitcher is to news and talk radio what Pandora
is to music
• Stitcher is a content aggregator
• Stitc...
Stitcher by the Numbers
•
•
•
•

12 million downloads
20,000+ shows
Over 1 million hours of listening weekly
Over 100 TB o...
With Growth Comes Pain
• DRBD database locked us into hardware
• Sublease of colocation facility restricted our
access to ...
What options did we consider?
• Move to another colocation facility
• Move to a cloud provider
• Move to a hybrid colocati...
Why we chose Amazon Web Services
• Familiarity
– Already using Amazon Simple Storage Service for our RSS
feeds
– Already e...
Why we chose Amazon Web Services
• Flexibility / Scalability
– Ability to adjust resources quickly in our production envir...
Why we chose Amazon Web Services
• Cost
–
–
–
–

Cost matches usage
Bandwidth savings when using Amazon CloudFront as our ...
Why we chose Amazon Web Services
• Documentation & Customer Service
– Knowledgeable solutions architects
– “Right-level” d...
Architecting Change
• Ask yourself: What are we trying to achieve?
• Know yourself, know your systems
• Consider industry ...
Use Puppet or Chef
• Configuration management tools are both
enabling and liberating
• Build, destroy, and build again
• W...
Our Architecture
Looks nice, but what does it do?
•
•
•
•
•

High Availability
Scalability
Security
Performance
Cost effectiveness
The Results – Database connections/sec
Before

225

After

450

0

100

200

300

400

500
The Results – GetStationPlaylist()
Before

0.75

After

0.1

0

0.2

0.4

0.6

0.8
The Results – Maximum throughput
Before

5000

After

20000

0

5000

10000

15000

20000

25000
The Results – Downtime
Before

1200

15

After

0

200

400

600

800

1000

1200

1400
Cost Optimization Results
• Twice the results for the same money
How we save money
•
•
•
•
•
•
•

Reserved instances
Appropriate instance types
CloudFront CDN
Rapid reorganization using t...
On Deck Cost Savings
•
•
•
•

Spot instances for processing tasks
Auto Scaling
In-app optimizations
Instance type tuning
Parting Advice
• Architect for 10X
• Take the time to get it right the first time (or at
least, close enough)
• Plan on co...
Maximizing Efficiency
and
Cost Optimization
Kingsley Wood – Business Development Manager, AWS
Considerations
•
•
•
•
•
•

Offloading – reduce footprint
Utilization – your biggest lever
Managed Services – leverage RDS...
OFFLOAD all static content
• reduce your compute demand and costs
• improve end-user experience
• increase reliability and...
ENTIRE SITE via CloudFront
• minimize client-server chatter (keep it at the edge)
• reduce server-database traffic (cache ...
Real World Example
Standard Setup

Optimized

• 4 x Medium Instances
$485
• AWS Data Transfer 1 TB
$194

• 1 x Medium Inst...
Offloading Tips
• Leverage S3, CloudFront, Route 53
• Eliminate repeated calls (edge and data cache)
• Static website host...
Utilization and Auto-Scaling: Granularity
more small instances vs. less large instances
29 Large @
$0.32/hr
= $9.28
59 Sma...
Utilization – Trigger Actions by Event
Leverage CloudWatch to collect and measure metrics
Buuuk for Singapore Press Holdings (SPH)
The Straits Times Mobile App
REAL-TIME reaction response
•
•
•
•

notification of pending News Flash (with audible alarm)
...
Architecture
Amazon Web Services provides services and
infrastructure to build reliable, fault-tolerant, and
highly availa...
Managed Services Reduce:
Managed Services

Amazon Relational
Database Service
(RDS)

Amazon
ElastiCache

Amazon Simple
Queue Service
(SQS)

Elastic...
$0.028
per hour

DNS

Elastic Load
Balancing

Web Servers
Availability Zone
$0.028
per hour

DNS

Elastic Load
Balancer

Web Servers
Availability Zone

VS

$0.08
per hour
(small instance)

DNS

EC2 ...
Consumers
Producer

$0.50 per
1,000,000 Requests
($0.0000005 per Request)

SQS queue
Consumers
Producer

SQS queue

$0.50 per
1,000,000 Requests
($0.0000005 per Request)

VS

$0.08
per hour
(small instance)
...
Consolidated Billing
RI Purchases to grow a Resource Pool
35
30
25

E
D
C
B
A

20
15
Reserved Instance
Pool

10
5

0
1

2

3

4

5

6

7

8

9
...
Tiered Pricing
Flexibility: Take advantage!
Architecture
vs.
Gardening
STOP/START
size changes
new instance types
vary capacity
rearrange...
What are Spot Instances?
• Value
 Pricing
• Up to 92% discount

 Elastic
• Capacity not otherwise
available

 Minimum C...
Key Points about Spot
•
•
•
•

Spare capacity – supply and demand
Be prepared for no availability at times
Be willing to a...
Consider 2 Time-to-Value Scenarios
1) Value of results quickly diminishes

2) Value of result stable until deadline

e.g.,...
Spot Applications
Ideal Applications
Batch Processing
Time-Delayable
Fault-Tolerant or Restartable
Compute-Intensive
Horiz...
Spot Advice and Tips
• Don’t build your reliability ENTIRELY on spot
vServ.mobi – exceptional and smart architecture
• Wit...
Look at the World Differently
•
•
•
•
•
•

Order of magnitude more capacity
New experiments enabled = innovation!
Lucky Oy...
A Spot Case Study
Ashay Padwal
CoFounder & CTO – vServ.mobi
GLOBAL

INNOVATION

FOCUSED

Award Winning
Mobile Ad Exchange
across Emerging Markets
31 Bn Ad Requests / Month

11% EUROPE

11% REST OF ASIA

7% NORTH
AMERICA
33% INDIA

10% SOUTH
AMERICA

14% MIDDLE
EAST & ...
Infrastructure: Requirements & Challenges
1

2

3

4

Requirement: Self Serve for Publisher On-boarding & Exit
Challenge: ...
Infrastructure: Solution
1

AWS

2

AWS

3

EC2 & ELB – Multi-AZ
Route53, CloudFront, S3

4

US East, US West, Europe, Sou...
Deployment Overview
Ad Delivery Setup
Now What? Reduce Cost without impacting Performance
• AWS is pretty cost-effective. But we were greedy!

• Saving more mea...
What we did
1

2

Selected the right Instance Type
- use CloudWatch for CPU & memory usage
- Load Test

Designed our serve...
What AWS did
1

Reduced pricing for EC2 (On Demand & Reserved) and S3

2

Cheap Archival System - Glacier

3

Pre warming ...
THANK YOU!
Ashay Padwal
CTO & Co-Founder
ashay@vserv.mobi
Closing – Key Takeaways
• Re-evaluate, revist and re:Invent
Evolve along with AWS
• Leverage
Managed Services, CloudWatch
...
Please give us your feedback on this
presentation

CPN211
As a thank you, we will select prize
winners daily for completed...
Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013
Upcoming SlideShare
Loading in...5
×

Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013

1,136

Published on

This session dives deep into techniques used by successful customers who optimized their use of AWS. Learn tricks and hear tips you can implement right away to reduce waste, choose the most efficient instance, and fine-tune your spending, often with improved performance and a better end-customer experience. We showcase innovative approaches and demonstrate easily-applicable methods for cost optimizing Amazon EC2, Amazon S3, and a host of other services to save you time and money.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,136
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
52
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Reducing Cost & Maximizing Efficiency: Tightening the Belt on AWS (CPN211) | AWS re:Invent 2013

  1. 1. CPN211 - Reducing Cost and Maximizing Efficiency: Tightening the Belt on AWS Tom Johnston - Business Development Manager, Amazon Web Services Sean Simpson - Director of Operations, Stitcher, Inc. Kingsley Wood - Business Development Manager, Amazon Web Services Ashay Padwal - CTO, Vserv.mobi November 15, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  2. 2. Introductions and Outline • Tom Johnston (AWS) Reducing Cost and Spending Smart • Sean Simpson (Stitcher) Moving to AWS – A Story • Kingsley Wood (AWS) Maximizing Efficiency and Cost Optimization • Ashay Padwal (vServ.mobi) a Spot Case Study
  3. 3. Reducing Cost and Spending Smart Tom Johnston – Business Development Manager, AWS
  4. 4. Fundamentals • • • • • Explicit Objectives Match Instances with Workloads Match Scale & Use with Demand Match Purchasing with Utilization Governance Matters
  5. 5. Objectives
  6. 6. Objectives AWS provides you the ability to match your architecture to your objectives
  7. 7. Instance types Start Choose an instance that best meets your basic requirements Match memory & virtual cores
  8. 8. Instance types Start Tune Choose an instance that best meets your basic requirements Change instance size up or down based upon monitoring Match memory & virtual cores Use CloudWatch & Trusted Advisor to assess
  9. 9. Know your usage Instance Free Memory Free CPU Free HDD … Custom Metrics … At 1-min intervals PUT 2 weeks Amazon CloudWatch Alarm
  10. 10. More Memory Memory (GB) High-Mem Cluster Compute High Storage High I/O High Mem Cluster Compute M3 C3 M1 High-CPU Processing Ability More Processing
  11. 11. Instance types Start Tune Roll-Out Choose an instance that best meets your basic requirements Change instance size up or down based upon monitoring Run multiple instances in multiple Availability Zones Match memory & virtual cores Use CloudWatch & Trusted Advisor to assess
  12. 12. Choose your metric optimize for the metric
  13. 13. Choose your metric optimize for the metric Cost per unit of work per instance(size) Workload A Workload B Workload C Optimal on 4x m1.xlarge Optimal on 10x m1.medium Optimal on 2x m3.xxlarge
  14. 14. Choose your metric optimize for the metric Cost per unit of work per instance (size) 100 concurrent jobs on 10 x m1.large @ $0.26 / hr = $ 0.026 / job vs 300 concurrent jobs on 10 x m3.xlarge @ $0.58 / hr = $ 0.019 / job
  15. 15. Choose your metric optimize for the metric Think workload density Don’t just focus on instance hourly rate
  16. 16. Server Load 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  17. 17. Server Load Capacity of 1 Server 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  18. 18. Server Load Traditional capacity required Capacity of 1 Server 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  19. 19. Server Load Traditional capacity required Capacity of 1 Server 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  20. 20. Server Load Traditional capacity required Capacity of 1 Server 1 Server for 8 hours 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  21. 21. Traditional capacity required Server Load 1 Server for 8 hours Capacity of 1 Server 1 Server for 8 hours 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  22. 22. Traditional capacity required Server Load 1 Server for 8 hours Capacity of 1 Server 1 Server for 8 hours 1 Server for 8 hours 1 Server for 8 hours 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  23. 23. Server Load Traditional capacity required Capacity of 1 Server 1/3rd Saving 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Hour of day
  24. 24. 6 Instance Count 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
  25. 25. 6 Instance Count 5 Monthly predictable peak processing 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
  26. 26. Traditional capacity required 6 Instance Count 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
  27. 27. Traditional capacity required 6 Instance Count 5 4 3 2 1 Elastic Capacity 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
  28. 28. Traditional capacity required 6 Instance Count 5 4 75% Savings 3 2 1 Elastic Capacity 0 0 2 4 6 8 10 12 14 16 18 Day of Month 20 22 24 26 28 30
  29. 29. Reserved instances On-demand instances Unix/Linux instances start at $0.02/hour Pay as you go for compute power Low cost and flexibility Pay only for what you use, no up-front commitments or long-term contracts Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing
  30. 30. Reserved instances On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery
  31. 31. Reserved instances Heavy utilization RI On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Up to 58% Savings
  32. 32. Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline
  33. 33. Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI Up to 49% Savings
  34. 34. Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI 41-79% utilization Lower costs up to 49% Use Cases: Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Use Cases: Web applications, many heavy processing tasks, running much of the time
  35. 35. Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI 41-79% utilization Lower costs up to 49% Use Cases: Web applications, many heavy processing tasks, running much of the time Use Cases: Light utilization RI Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery Up to 34% Savings
  36. 36. Reserved instances Heavy utilization RI > 80% utilization Lower costs up to 58% On-demand instances Reserved instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Low cost and flexibility Low Cost / Predictability Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Databases, Large Scale HPC, Always-on infrastructure, Baseline Medium utilization RI 41-79% utilization Lower costs up to 49% Use Cases: Web applications, many heavy processing tasks, running much of the time Use Cases: Light utilization RI Applications with steady state or predictable usage Applications that require reserved capacity, including disaster recovery 15-40% utilization Lower costs up to 34% Use Cases: Disaster Recovery, Weekly / Monthly reporting, Elastic Map Reduce
  37. 37. Best RI for Utilization $18,000 $16,000 $14,000 $12,000 $10,000 $8,000 Heavy Medium Light $6,000 $4,000 $2,000 $- O-Demand
  38. 38. Best RI for Utilisation $18,000 $16,000 $14,000 $12,000 $10,000 $8,000 Heavy Medium Light $6,000 $4,000 $2,000 $- O-Demand
  39. 39. Optimizing costs with RIs 14 12 On Demand 10 Light Utilization RI 8 Medium Utilization RI 6 Heavy utilization RI 4 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
  40. 40. Spot instances On-demand instances Reserved instances Spot instances Unix/Linux instances start at $0.02/hour 1- or 3-year terms Bid on unused EC2 capacity Pay as you go for compute power Pay low up-front fee, receive significant hourly discount Spot Price based on supply/demand, determined automatically Low cost and flexibility Low Cost / Predictability Cost / Large Scale, dynamic workload handling Pay only for what you use, no up-front commitments or long-term contracts Helps ensure compute capacity is available when needed Use Cases: Applications with short term, spiky, or unpredictable workloads; Application development or testing Use Cases: Use Cases: Applications with flexible start and end times Applications with steady state or predictable usage Applications only feasible at very low compute prices Applications that require reserved capacity, including disaster recovery
  41. 41. Governance Matters • Who can create and launch instances? • Who checks that only needed instances are running? • Have specific policies • Use AWS tools such as IAM to help enforce them
  42. 42. Checklist • • • • • • Identify your goals Understand your workload & match to instances Scale up and down with demand Align purchasing methods & utilization Have governance appropriate to your goals Change in goals & workload will drive change in use of AWS
  43. 43. Moving to AWS – A Story Sean Simpson Director of Operations - Stitcher, Inc.
  44. 44. What is Stitcher? • Stitcher is to news and talk radio what Pandora is to music • Stitcher is a content aggregator • Stitcher is an on-demand service • Stitcher is deployed on mobile, CE, and automotive platforms
  45. 45. Stitcher by the Numbers • • • • 12 million downloads 20,000+ shows Over 1 million hours of listening weekly Over 100 TB outbound data monthly
  46. 46. With Growth Comes Pain • DRBD database locked us into hardware • Sublease of colocation facility restricted our access to our servers • Server leases and purchases constrained our architecture • Growth inhibited by human, server, and vendor resources
  47. 47. What options did we consider? • Move to another colocation facility • Move to a cloud provider • Move to a hybrid colocation/cloud provider
  48. 48. Why we chose Amazon Web Services • Familiarity – Already using Amazon Simple Storage Service for our RSS feeds – Already experimenting with Amazon Elastic Compute Cloud – Recently implemented Amazon Simple Queue Service
  49. 49. Why we chose Amazon Web Services • Flexibility / Scalability – Ability to adjust resources quickly in our production environment – Ability to create any number of environments – Ability to design servers as we wanted with respect to operating systems, systems software, etc.
  50. 50. Why we chose Amazon Web Services • Cost – – – – Cost matches usage Bandwidth savings when using Amazon CloudFront as our CDN Many resources to assist in optimization Put simply, we got our solution for the lowest quote
  51. 51. Why we chose Amazon Web Services • Documentation & Customer Service – Knowledgeable solutions architects – “Right-level” documentation – Quick response to our needs
  52. 52. Architecting Change • Ask yourself: What are we trying to achieve? • Know yourself, know your systems • Consider industry best practices (but don’t blindly follow them) • Read the documentation
  53. 53. Use Puppet or Chef • Configuration management tools are both enabling and liberating • Build, destroy, and build again • Write once, build many • Nuances between node types are managed with clearly written rules • Naming conventions are your friend
  54. 54. Our Architecture
  55. 55. Looks nice, but what does it do? • • • • • High Availability Scalability Security Performance Cost effectiveness
  56. 56. The Results – Database connections/sec Before 225 After 450 0 100 200 300 400 500
  57. 57. The Results – GetStationPlaylist() Before 0.75 After 0.1 0 0.2 0.4 0.6 0.8
  58. 58. The Results – Maximum throughput Before 5000 After 20000 0 5000 10000 15000 20000 25000
  59. 59. The Results – Downtime Before 1200 15 After 0 200 400 600 800 1000 1200 1400
  60. 60. Cost Optimization Results • Twice the results for the same money
  61. 61. How we save money • • • • • • • Reserved instances Appropriate instance types CloudFront CDN Rapid reorganization using the API Monitor utilization Load test Housecleaning
  62. 62. On Deck Cost Savings • • • • Spot instances for processing tasks Auto Scaling In-app optimizations Instance type tuning
  63. 63. Parting Advice • Architect for 10X • Take the time to get it right the first time (or at least, close enough) • Plan on continuous evolution of systems
  64. 64. Maximizing Efficiency and Cost Optimization Kingsley Wood – Business Development Manager, AWS
  65. 65. Considerations • • • • • • Offloading – reduce footprint Utilization – your biggest lever Managed Services – leverage RDS, SQS, SES Consolidated Billing – pooling resources Flexible Evolution – continually revisit Spot Instances – think big, new possibilities
  66. 66. OFFLOAD all static content • reduce your compute demand and costs • improve end-user experience • increase reliability and durability +
  67. 67. ENTIRE SITE via CloudFront • minimize client-server chatter (keep it at the edge) • reduce server-database traffic (cache the common calls) • speed up mobile app response (persistent connections) +
  68. 68. Real World Example Standard Setup Optimized • 4 x Medium Instances $485 • AWS Data Transfer 1 TB $194 • 1 x Medium Instance $121 • CloudFront Data 1 TB $168 • CloudFront Requests $1.89 • Total = $291 • Total = $679 57% Lower Cost + 6X Faster
  69. 69. Offloading Tips • Leverage S3, CloudFront, Route 53 • Eliminate repeated calls (edge and data cache) • Static website hosting on S3 No web server at all! • Minimize your EC2 and database footprint stand up Read Replicas for variable loads
  70. 70. Utilization and Auto-Scaling: Granularity more small instances vs. less large instances 29 Large @ $0.32/hr = $9.28 59 Small @ $0.08/hr = $4.72
  71. 71. Utilization – Trigger Actions by Event Leverage CloudWatch to collect and measure metrics
  72. 72. Buuuk for Singapore Press Holdings (SPH)
  73. 73. The Straits Times Mobile App REAL-TIME reaction response • • • • notification of pending News Flash (with audible alarm) on-demand ramp up of capacity (6 mins) subscriber alert push delivered mass response traffic handled (followed by ramp down)
  74. 74. Architecture Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available systems in the cloud. These qualities have been designed into our services both by handling such aspects without any special action by you and by providing features that must be used explicitly and correctly.
  75. 75. Managed Services Reduce:
  76. 76. Managed Services Amazon Relational Database Service (RDS) Amazon ElastiCache Amazon Simple Queue Service (SQS) Elastic Load Balancing Amazon Elastic MapReduce Amazon Simple Email Service (SES) Amazon Simple Notification Service (SNS)
  77. 77. $0.028 per hour DNS Elastic Load Balancing Web Servers Availability Zone
  78. 78. $0.028 per hour DNS Elastic Load Balancer Web Servers Availability Zone VS $0.08 per hour (small instance) DNS EC2 instance + software LB Web Servers Availability Zone
  79. 79. Consumers Producer $0.50 per 1,000,000 Requests ($0.0000005 per Request) SQS queue
  80. 80. Consumers Producer SQS queue $0.50 per 1,000,000 Requests ($0.0000005 per Request) VS $0.08 per hour (small instance) Producer EC2 instance + software queue Consumers
  81. 81. Consolidated Billing
  82. 82. RI Purchases to grow a Resource Pool 35 30 25 E D C B A 20 15 Reserved Instance Pool 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12
  83. 83. Tiered Pricing
  84. 84. Flexibility: Take advantage! Architecture vs. Gardening STOP/START size changes new instance types vary capacity rearrange, etc.
  85. 85. What are Spot Instances? • Value  Pricing • Up to 92% discount  Elastic • Capacity not otherwise available  Minimum Commitment • Commit to 1 hour • Tradeoff  Potential for interruption
  86. 86. Key Points about Spot • • • • Spare capacity – supply and demand Be prepared for no availability at times Be willing to accept and deal with interruption Far greater potential scale starting at 5X default instance limits • Massive possible capacity = new ideas…
  87. 87. Consider 2 Time-to-Value Scenarios 1) Value of results quickly diminishes 2) Value of result stable until deadline e.g., Engineering simulations e.g., Analytics before an M&A deal
  88. 88. Spot Applications Ideal Applications Batch Processing Time-Delayable Fault-Tolerant or Restartable Compute-Intensive Horizontally Scalable Stateless Worker Nodes Region and AZ Independent Uses Deployment Automation Less Ideal Applications Interactive Strict/Tight SLA for Completion Expensive to Handle Terminations Data-Intensive In-Memory Scaling Long-Running Worker Nodes Requires a Single AZ Manually Launched and Managed
  89. 89. Spot Advice and Tips • Don’t build your reliability ENTIRELY on spot vServ.mobi – exceptional and smart architecture • With time flexibility, different approaches: delayed results, lower cost spend less, quicker answers • Ask different questions: with enormous capacity, what is now possible?
  90. 90. Look at the World Differently • • • • • • Order of magnitude more capacity New experiments enabled = innovation! Lucky Oyster – recommendation exchange Prototyping a new search technology idea (using Common Crawl) 3.4 billion web pages > 1 TB of data > Index of 400 million entities “The cost? About $100... in about 14 hours”
  91. 91. A Spot Case Study Ashay Padwal CoFounder & CTO – vServ.mobi
  92. 92. GLOBAL INNOVATION FOCUSED Award Winning Mobile Ad Exchange across Emerging Markets
  93. 93. 31 Bn Ad Requests / Month 11% EUROPE 11% REST OF ASIA 7% NORTH AMERICA 33% INDIA 10% SOUTH AMERICA 14% MIDDLE EAST & AFRICA 14% SE ASIA Over 200 Mn Unique Users / Month
  94. 94. Infrastructure: Requirements & Challenges 1 2 3 4 Requirement: Self Serve for Publisher On-boarding & Exit Challenge: No Capacity Planning; Extreme Scalability Requirement: Start Up Challenge: No Capex, no Lock-in Requirement: Least Latency & High Availability Challenge: Suite of services – Compute, Load Balancing, DNS, CDN, Storage, Multiple DCs per location Requirement: Global Setup management with small team Challenge: Availability across Regions with extensive APIs
  95. 95. Infrastructure: Solution 1 AWS 2 AWS 3 EC2 & ELB – Multi-AZ Route53, CloudFront, S3 4 US East, US West, Europe, South America, Asia For Middle East, we host in Turkey For Africa, we host in South Africa
  96. 96. Deployment Overview
  97. 97. Ad Delivery Setup
  98. 98. Now What? Reduce Cost without impacting Performance • AWS is pretty cost-effective. But we were greedy! • Saving more meant more money for other areas in our business. • We walked in the opposite direction... and it worked! • We use spot instances in production extensively. • Sounds risky? - Yes, but if you architect your system correctly, you should be safe.
  99. 99. What we did 1 2 Selected the right Instance Type - use CloudWatch for CPU & memory usage - Load Test Designed our servers to be self-sufficient and perishable - 3 Business logic & DB on same server Transaction Logs written to EBS Auto Setup on Server Data Collection module We built a custom Scaling solution - Add/Remove instances by checking present traffic & predicting traffic in the immediate future Based on trending of spot prices either try launching spot or fall back to on-demand instances Remove servers if in use between 45-55min Track spot prices to shift to on-demand
  100. 100. What AWS did 1 Reduced pricing for EC2 (On Demand & Reserved) and S3 2 Cheap Archival System - Glacier 3 Pre warming of Load Balancer (ELB) 4 AMI movement across regions 5 ELB with equal distribution of traffic across instances spread in any Availability Zone
  101. 101. THANK YOU! Ashay Padwal CTO & Co-Founder ashay@vserv.mobi
  102. 102. Closing – Key Takeaways • Re-evaluate, revist and re:Invent Evolve along with AWS • Leverage Managed Services, CloudWatch • Stay up to date RI modifications, Trusted Advisor • AWS Blog: aws.typepad.com
  103. 103. Please give us your feedback on this presentation CPN211 As a thank you, we will select prize winners daily for completed surveys!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×