Capacity Planning for fun & profit

Capacity Planning
for fun & proﬁt
beyond cacti and top

II São Paulo Perl Workshop
Rodrigo Albani de Campos - @xinu
camposr@gmail.com

Agenda

• Capacity planning primer: a tale of
discovery
• Metrics
• Queues
• Models

Why Perl ?

• Main reason: I feel comfortable with it
• Ubiquitous and free
• Plenty of stable statistics modules available
at CPAN
• Ultimately, it gets the job done

Capacity Planning
• Is just like sex...
• Everyone wants to do it
• Many say they’re doing it
• You always exaggerate how much of it
you’re doing
• Most people aren’t actually doing it (despite
their best efforts)
• Everybody else seems to be doing more
than you

A tale of discovery

There once was a system administrator...

A tale of discovery

How many ?
Actual capacity ?
Servers do we need ?
How much memory ?
What’s the predicted growth ?
IO Capacity ?

Typical Performance
Metrics
• Load Average - uptime
• The single most misunderstood metric
• CPU - mpstat
• IO - iostat
• Memory Usage - vmstat

Typical Performance
Metrics

Time series charts
I’m looking at you cacti huggers !

• Time series performance data is useful for:
• Troubleshooting
• Simplistic forecasting
• Find trends
• Identify seasonal behavior
• This left alone is NOT Capacity Planning

Frustration
• Computer systems can be harsh
• Most systems will not scale linearly
• Diminishing returns and lock contention
will punch you in the face
• “Oh but I’ve checked cacti and the CPU
was 25% idle”

Let’s put it in the Cloud

• We are moving back to an utility computing
model
• You’re charged per usage
• Even more important to care about
capacity planning !!!

Call the experts
• Cost per MIPS
• IBM System/370 model 158-3 - 1.0 MIPS @
1.0 MHz -1972
• Average purchase price: $ 771,000*
• No disks or peripherals included
• $ 4,082,039 by 2011
• Need to squeeze every drop of processing
power
* Source: http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html

Queues
The not so typical performance metrics
• 1961 - CTSS was ﬁrst
demonstrated at MIT
• 1965 - Allan Scherr used
machine repairman
problem to model a
time-shared system as
part of Project MAC
• Another offspring of
Project MAC is Multics

Queues

Computer System
Disks

CPU

Queues

(A) λ X (C)
S
Open/Closed W
Network R
A Arrival Count
λ Arrival Rate (A/T)
W Time spent in Queue
R Residence Time (W+S)
S Service Time
X System Throughput (C/T)
C Completed tasks count

Arrival Rate (λ)
• Pretty straightforward
• Requests per second/hour/day
• Not the same as throughput (X)
• Although in a steady state:
• A = C as T →∞
• λ=X

Service Time (S)

• Time spent in processing
• Web server response time
• Total query time
• IO operation time length

!"#$%&"'(%)"*+,'

Mythical Performance
!#)"

!#("

!#'"

• Not gonna happen...
!"#$%&"'(%)"'*+,'

!#&"
*+,-./+"0.1+234"

• Don’t believe vendor’s sales pitch
!#%"

• “In God we trust, all others must bring
data” - William Edwards Deming
!#$"

!"
!" (" $!" $(" %!" %(" &!" &(" '!" '("
-##%$./'0.1"'*2%1+3+,'

Mythical Performance

• Not gonna happen...
• Don’t believe vendor’s sales pitch
• “In God we trust, all others must bring
data” - William Edwards Deming

How to measure ?

• Apache: %D in mod_log_conﬁg
• nginx: $request_time in HttpLogModule
• use Benchmark;
• tcprstat - http://goo.gl/0cbYx
• collectd - http://goo.gl/OXKG7
• metrics - http://goo.gl/gQFVM
• sysstat - http://goo.gl/2aLul

How to measure ?
my ($date,$svctime) = (m/[(S+).+?s(d+)$/);

$arrivalRate{$date}++;

$serviceTimeAcc{$date} += $svctime;

[02/Jul/2010:14:00:18... 1863

Time to serve the
request,
in μseconds.

use Chart::Clicker;

Average Hits/s = 65.142
Average Svc time = 0.0159

What to look for ?

• Stretch factor
• Method/Operation
• Geolocation
• Cookies
• Use mod_logio to measure inbound
trafﬁc as well

Modeling
Prediction is very difﬁcult, especially if it’s about
the future.
Niels Bohr
Capacity planning is about setting expectations.
Even wrong expectations are better than no
expectations!
Neil J. Gunther - The Guerrilla Manifesto
http://goo.gl/lZKWH

Modeling

• A model is an abstraction of a complex
system
• A model allows us to observe phenomena
that cannot be easily replicated

Modeling Methods
• Statistics / Trending / Forecasting
• Pros:
• Easy to understand
• Tools readily available
• Cons:
• Hard to create “What-if” scenarios
• Hard to predict contention and
bottlenecks

Modeling Methods
• Queuing Analisys

• Pros:

• Allows you to make predictions when no
production data is available

• Allows you to create “What-if” scenarios

• Cons:

• Sometimes it can be unintuitive

• The math behind it can be difﬁcult

Queues as models
Typical LAMP Stack

Clients

Requests Replies

Apache Application Database

Queues as models
What if ?

Clients

Requests Replies

Cache Apache Application Database

Queues as models
What happens if we use a 15k RPM disk ?

CPU Disk 10k
RPM

Queues as models
m1.small ? m1.large ? m1.xlarge ?

Virtual Cores X
EC2 CU

Memory Bus

use pdq;

• Available at http://goo.gl/s98wQ (not on
CPAN)
• PDQ is a queuing circuit solver by Neil J.
Gunther
• There’s a whole book about it
http://goo.gl/9MA2c

use pdq;
CreateNode() Define a queuing center

Define a traffic stream of an
CreateOpen()
open circuit
Define a traffic stream of a
CreateClosed()
closed circuit
Define the service demand for
SetDemand()
each of the queuing centers

use pdq;
Node Types

CEN Queuing Center

DLY Delay Center

use pdq;
Service Disciplines

FCFS First-come first-served

LCFS Last-come first-served

ISRV Infinite Server

PSHR Processor Sharing

use pdq;
• Apache Web Server
• Average Network RTD: 0.00921 seconds
• Added as a delay center in the circuit
• Average Arrival Rate: 65.142 hits/s
• Average Service time: 0.0159 seconds
• 128 worker threads

use pdq;
$workload = "httpd";

$httpMaxClient = 128;

pdq::Init("web server");

$arrivalRate = 65.142;

$serviceTime = 0.1159;

$pdq::streams =

pdq::CreateOpen($workload,
$arrivalRate);

pdq::Report();
Metric Value Unit

------ ----- ----

Workload: "httpd"

Number in system 8.0279 Trans

Mean throughput 65.1420 Trans/Sec

Response time 0.1232 Sec

Stretch factor 1.0626

pdq::Report();

Bounds Analysis:

Max throughput 1104.4003 Trans/Sec

Min response 0.1160 Sec

pdq::Report();

• Average request size: 145 KBytes
• ~ 1160 Kbits
• @1104 transactions / second:
• 1,280,640 Kbits /s ~ 1.28 Gbps

Resources and
References
• CMG Public Proceedings:
http://www.cmg.org/proceedings/
• Measure IT:
http://www.cmg.org/measureit/
• Guerrilla Capacity Planning
http://www.perfdynamics.com/Classes/
Outlines/guerilla.html

Resources and
References
• Performance by Design - Menasce, Dowdy,
Almeida - http://amzn.to/mpqfVO
• Capacity Planning for Web Performance:
Metrics, Models, and Methods - Daniel
Menasce,Virgilio Almeida - http://amzn.to/
lOATba
• Capacity Planning for Web Services: Metrics,
Models, and Methods - Daniel Menasce,
Virgilio Almeida - http://amzn.to/iClpsB

Resources and
References
• Guerrilla Capacity Planning: A Tactical
Approach to Planning for Highly Scalable
Applications and Services - Neil Gunther -
http://amzn.to/kfrfLK
• The Art of Computer Systems Performance
Analysis: Techniques for Experimental Design,
Measurement, Simulation, and Modeling - R.
K. Jain - http://amzn.to/jqud1I

Capacity Planning for fun & profit

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Capacity Planning for fun & profit

Similar to Capacity Planning for fun & profit (20)

More from Rodrigo Campos

More from Rodrigo Campos (20)

Recently uploaded

Recently uploaded (20)

Capacity Planning for fun & profit