Cloud Native Cost Optimization
Adrian Cockcroft @adrianco
Technology Fellow - Battery Ventures
November 2014
@adrianco
What does
everyone want?
@adrianco
Less Time
Less Cost
@adrianco
Faster!
See talks by @adrianco
Speed and Scale - QCon New York
Fast Delivery - GOTO Copenhagen
@adrianco
Cheaper!
This brand new talk:
How to use Cloud Native architecture to
reduce cost without slowing down
@adrianco
How is Cost
Measured?
1
Bottom Up
2
Product
3
Top Down
@adrianco
Bottom Up Costs
Add up the cost to
buy and operate
every component
1
@adrianco
Product Cost
Cost of delivering
and maintaining
each product
2
@adrianco
Top Down Costs
Divide total budget
by the number of
components
3
@adrianco
Top Down vs. Bottom Up
Will never match!
!
Hidden Subsidies vs.
Hidden Costs1
3
@adrianco
Agile Team Product Profit
Value minus costs
Time to value
ROI, NPV, MMF
Profit center
2
$
@adrianco
What about cloud
costs?
@adrianco
Cloud Native Cost Optimization
Optimize for speed first
Turn it off!
Capacity on demand
Consolidate and Reserve
Plan for price cuts
FOSS tooling
$$ $
@adrianco
Do The Impossible Immediately
Ne#lix'Examples'
•  European'Launch'using'AWS'Ireland'
–  No'employees'in'Ireland,'no'provisioning'delay,'everything'
worked'
–  No'need'to'do'detailed'capacity'planning'
–  OverAprovisioned'on'day'1,'shrunk'to'fit'aDer'a'few'days'
–  Capacity'grows'as'needed'for'addiGonal'country'launches'
•  Brazilian'Proxy'Experiment'
–  No'employees'in'Brazil,'no'“meeGngs'with'IT”'
–  Deployed'instances'into'two'zones'in'AWS'Brazil'
–  Experimented'with'network'proxy'opGmizaGon'
–  Decided'that'gain'wasn’t'enough,'shut'everything'down'
@adrianco
The Capacity
Planning Problem
@adrianco
Best Case Waste
Product(Launch(Agility(2(Rightsized(
Pre2Launch(
Build2out(
Tes9ng(
Launch(
Grow
th(
Grow
th( Demand(
Cloud(
Datacenter(
$(
Cloud capacity
used is maybe
half average
DC capacity
@adrianco
Failure to Launch
Product(Launch(-(Under-es1mated(
Pre-Launch
Build-out
Testing
Launch
G
row
th
G
row
th
Mad scramble
to add more DC
capacity during
launch phase
outages
@adrianco
Over the Top Losses
Product(Launch(Agility(–(Over6es8mated(
Pre-Launch
Build-out
Testing
Launch
G
row
th
G
row
th
$
Capacity wasted
on failed launch
magnifies the
losses
@adrianco
Turning off Capacity
Off-peak production
Test environments
Dev out of hours
Dormant Data ScienceWhen%you%turn%off%your%cloud%resources,%
you%actually%stop%paying%for%them%
@adrianco
Turn off Test Environments
Snapshot or freeze
Fast restart needed
Persistent storage
40 of 168 hrs/wk
@adrianco
Seasonal Savings
1 5 9 13 17 21 25 29 33 37 41 45 49
WebServers
Week
Optimize during a year
50% Savings
Weekly&CPU&Load&
@adrianco
Autoscale the Costs Away
50%+%Cost%Saving%
Scale%up/down%
by%70%+%
Move%to%Load=Based%Scaling%
@adrianco
Daily Duty Cycle
Business'Throughput'Instances'
Reactive Autoscaling
saves around 50%
Predictive Autoscaling saves around 70%
See Scryer on Netflix Tech Blog
@adrianco
Underutilized and Unused
AWS$Support$–$Trusted$Advisor$–$
Your$personal$cloud$assistant$
@adrianco
Clean Up the Crud
Other&simple&op-miza-on&-ps&
•  Don’t&forget&to…&
– Disassociate&unused&EIPs&
– Delete&unassociated&Amazon&
EBS&volumes&
– Delete&older&Amazon&EBS&
snapshots&
– Leverage&Amazon&S3&Object&
Expira-on&
& Janitor&Monkey&cleans&up&unused&resources&
@adrianco
Total Cost of Oranges
When%Comparing%TCO…!
Make!sure!that!
you!are!including!
all!the!cost!factors!
into!considera4on!
Place%
Power%
Pipes%
People%
Pa6erns%
How much does Openstack
or ESX datacenter automation
software and support
cost per instance?
@adrianco
When Do You Pay?
@adrianco
bill
Now
Next
Month
Ages
Ago
Lease
Building
Install
AC etc
Rack &
Stack
Private
Cloud SW
Run
My Stuff
Datacenter Up Front Costs
@adrianco
Reservation Reductions
On Demand Light Use Medium Use Heavy Use
$ No Upfront $172 upfront $286 upfront $337 upfront
$0.070/hr $0.050/hr $0.022/hr $0.015/hr
$1840/36mo $1486/36mo $864/36mo $731/36mo
Savings 22% 53% 60%
Prices on Nov 11th, for m3.medium (1 vCPU, 3.75G RAM, SSD) purely to show typical savings
@adrianco
Blended BenefitsMix$and$Match$Reserved$Types$and$On4Demand$
Instances(
Days(of(Month(
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Heavy&Utilization&Reserved Instances
Light&RI Light&RILight&RILight&RI
On8Demand
@adrianco
Consolidated Cost CuttingConsolidated+Billing:+Single+payer+for+a+group+of+
accounts+
•  One$Bill+for+mul7ple+accounts+
•  Easy$Tracking$of+account+
charges+(e.g.,+download+CSV+of+
cost+data)+
•  Volume$Discounts+can+be+
reached+faster+with+combined+
usage+
•  Reserved$Instances$are+shared+
across+accounts+(including+RDS+
Reserved+DBs)+
@adrianco
Production PriorityOver%Reserve(the(Produc0on(Environment(
Produc0on(Env.(
Account(
100#Reserved#
QA/Staging(Env.(
Account(
0#Reserved##
Perf(Tes0ng(Env.(
Account(
0#Reserved##
Development(Env.(
Account(
0#Reserved#
Storage(Account( 0#Reserved#
Total#Capacity#
@adrianco
Actual Reservation UsageConsolidated+Billing+Borrows+Unused+Reserva4ons+
Produc4on+Env.+
Account+
68#Used#
QA/Staging+Env.+
Account+
10#Borrowed#
Perf+Tes4ng+Env.+
Account+
6#Borrowed##
Development+Env.+
Account+
12#Borrowed#
Storage+Account+ 4#Borrowed#
Total#Capacity#
@adrianco
Consolidated Reservations
Burst capacity guarantee
Higher availability with lower cost
Other accounts soak up any extra
Monthly billing roll-up
Capitalize reservation charges!
But: Fixed location and instance type
@adrianco
Re-sell Reservations
Reserved'Instance'Marketplace'
Buy$a$smaller$term$instance$
Buy$instance$with$different$OS$or$type$
Buy$a$Reserved$instance$in$different$region$
Sell$your$unused$Reserved$Instance$
Sell$unwanted$or$over;bought$capacity$
Further$reduce$costs$by$op>mizing$
@adrianco
Use EC2 Spot Instances
Cloud native
dynamic autoscaled
spot instances
!
Real world total
savings up to 50%
@adrianco
Right Sizing Instances
Fit the instance size to the workload
@adrianco
Technology Refresh
Older m1 and m2 families
• Slower CPUs
• Higher response times
• Smaller caches (6MB)
• Oldest m1.xlarge
• 15G/8.5ECU/35c 23ECU/$
• Old m2.xlarge
• 17G/6.5ECU/25c 26ECU/$
New m3 family
• Faster CPUs
• Lower response times
• Larger caches (20MB)
• Java perf ratio > ECU
• New m3.xlarge
• 15G/13ECU/28c 46ECU/$
• 77% better ECU/$
• Deploy fewer instances
@adrianco
Data Science for Free
Follow%the%Customer%(Run%web%servers)%during%the%day%
Follow%the%Money%(Run%Hadoop%clusters)%at%night%
0
2
4
6
8
10
12
14
16
Mon Tue Wed Thur Fri Sat Sun
No#of#Instances#Running#
Week
Auto7Scaling7Servers
Hadoop7Servers
No.%of%Reserved%
Instances%
@adrianco
Real Reservation ReductionsSoaking(up(unused(reserva0ons(
Unused(reserved(instances(is(published(as(a(metric(
(
Ne9lix(Data(Science(ETL(Workload(
•  Daily(business(metrics(rollAup(
•  Starts(aBer(midnight(
•  EMR(clusters(started(using(hundreds(of(instances(
Ne9lix(Movie(Encoding(Workload(
•  Long(queue(of(high(and(low(priority(encoding(jobs(
•  Can(soak(up(1000’s(of(addi0onal(unused(instances(
@adrianco
Six Ways to Cut Costs
#1#Business#Agility#by#Rapid#Experimenta8on#=#Profit#
#2#Business>driven#Auto#Scaling#Architectures#=#Savings##
#3#Mix#and#Match#Reserved#Instances#with#On>Demand#=#Savings#
#4#Consolidated#Billing#and#Shared#Reserva8ons#=#Savings#
#5#Always>on#Instance#Type#Op8miza8on#=#Recurring#Savings#
Building#Cost>Aware#Cloud#Architectures#
#6#Follow#the#Customer#(Run#web#servers)#during#the#day#
Follow#the#Money#(Run#Hadoop#clusters)#at#night#
@adrianco
Expect Prices to Drop
Three Years Halving Every 18mo = maybe 40% over-all savings
0
25
50
75
100
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
Data shown is purely illustrative
@adrianco
Combinations
@adrianco
Lift and Shift Compounding
0
25
50
75
100
Base Price Rightsized Seasonal Daily Scaling Reserved Tech Refresh Price Cuts
25
3030
707070
100 Traditional
application
using AWS
heavy use
reservations
Base price is for capacity bought up-front
@adrianco
Conservative Compounding
0
25
50
75
100
Base Price Rightsized Seasonal Daily Scaling Reserved Tech Refresh Price Cuts
15
20
25
35
50
70
100 Cloud native
application
partially optimized
light use reservations
@adrianco
Agressive Compounding
0
25
50
75
100
Base Price Rightsized Seasonal Daily Scaling Reserved Tech Refresh Price Cuts
46812
25
50
100 Cloud native application
fully optimized autoscaling
mixed reservation use
costs 4% of base price
over three years!
@adrianco
Cloud Native Patterns
● Business logic isolation in stateless micro-services
● Immutable code with instant rollback
● Auto-scaled capacity and deployment updates
● Distributed across availability zones and regions
● De-normalized single function NoSQL data stores
● See over 40 NetflixOSS projects at netflix.github.com
● Get “Technical Indigestion” trying to keep up with techblog.netflix.com
@adrianco
Cost Monitoring and Optimization
@adrianco
Final Thoughts
Turn off idle instances
Clean up unused stuff
Optimize for pricing model
Assume prices will go down
Go cloud native to save
@adrianco
Further Reading
See www.battery.com for a list of portfolio investments
● Battery Ventures Blog http://www.battery.com/powered
● Adrian’s Blog http://perfcap.blogspot.com and Twitter @adrianco
● Slideshare http://slideshare.com/adriancockcroft
!
● Monitorama Opening Keynote Portland OR - May 7
th
, 2014 - Video available
● GOTO Chicago Opening Keynote May 20
th
, 2014 - Video available
● Qcon New York – Speed and Scale - June 11
th
, 2014 - Video available
● Structure - Cloud Trends - San Francisco - June 19th, 2014 - Video available
● GOTO Copenhagen/Aarhus – Denmark – Sept 25
th
, 2014
● DevOps Enterprise Summit - San Francisco - Oct 21-23rd, 2014 - Videos available
● GOTO Berlin - Germany - Nov 6th, 2014
● AWS Re:Invent - Las Vegas - Cloud Native Cost Optimization - November 14th, 2014
● Dockercon Europe - Amsterdam - Microservices - December 4th, 2014

Cloud Native Cost Optimization