Capacity Planning for fun & profit
Upcoming SlideShare
Loading in...5

Capacity Planning for fun & profit



Capacity Planning for fun & profit, as presented in the 2nd São Paulo Perl Mongers Conference

Capacity Planning for fun & profit, as presented in the 2nd São Paulo Perl Mongers Conference



Total Views
Views on SlideShare
Embed Views



4 Embeds 161 139 18 2 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Capacity Planning for fun & profit Capacity Planning for fun & profit Presentation Transcript

  • Capacity Planningfor fun & profit beyond cacti and top II São Paulo Perl Workshop Rodrigo Albani de Campos - @xinu
  • Agenda• Capacity planning primer: a tale of discovery• Metrics• Queues• Models
  • Why Perl ?• Main reason: I feel comfortable with it• Ubiquitous and free• Plenty of stable statistics modules available at CPAN• Ultimately, it gets the job done
  • Capacity Planning• Is just like sex... • Everyone wants to do it • Many say they’re doing it • You always exaggerate how much of it you’re doing • Most people aren’t actually doing it (despite their best efforts) • Everybody else seems to be doing more than you
  • A tale of discoveryThere once was a system administrator...
  • A tale of discovery How many ? Actual capacity ? Servers do we need ? How much memory ? What’s the predicted growth ? IO Capacity ?
  • Typical Performance Metrics• Load Average - uptime • The single most misunderstood metric• CPU - mpstat• IO - iostat• Memory Usage - vmstat
  • Typical Performance Metrics
  • Time series charts I’m looking at you cacti huggers !• Time series performance data is useful for: • Troubleshooting • Simplistic forecasting • Find trends • Identify seasonal behavior• This left alone is NOT Capacity Planning
  • Frustration• Computer systems can be harsh• Most systems will not scale linearly• Diminishing returns and lock contention will punch you in the face• “Oh but I’ve checked cacti and the CPU was 25% idle”
  • Let’s put it in the Cloud• We are moving back to an utility computing model• You’re charged per usage• Even more important to care about capacity planning !!!
  • Call the experts • Cost per MIPS • IBM System/370 model 158-3 - 1.0 MIPS @ 1.0 MHz -1972 • Average purchase price: $ 771,000* • No disks or peripherals included • $ 4,082,039 by 2011 • Need to squeeze every drop of processing power* Source:
  • QueuesThe not so typical performance metrics• 1961 - CTSS was first demonstrated at MIT• 1965 - Allan Scherr used machine repairman problem to model a time-shared system as part of Project MAC• Another offspring of Project MAC is Multics
  • QueuesThe not so typical performance metrics Computer System Disks CPU
  • QueuesThe not so typical performance metrics (A) λ X (C) SOpen/Closed W Network R A Arrival Count λ Arrival Rate (A/T) W Time spent in Queue R Residence Time (W+S) S Service Time X System Throughput (C/T) C Completed tasks count
  • Arrival Rate (λ)• Pretty straightforward• Requests per second/hour/day• Not the same as throughput (X) • Although in a steady state: • A = C as T →∞ • λ=X
  • Service Time (S)• Time spent in processing • Web server response time • Total query time • IO operation time length
  • !"#$%&"(%)"*+, Mythical Performance !#)" !#(" !#" • Not gonna happen...!"#$%&"(%)"*+, !#&" *+,-./+"0.1+234" • Don’t believe vendor’s sales pitch !#%" • “In God we trust, all others must bring data” - William Edwards Deming !#$" !" !" (" $!" $(" %!" %(" &!" &(" !" (" -##%$./0.1"*2%1+3+,
  • Mythical Performance• Not gonna happen...• Don’t believe vendor’s sales pitch• “In God we trust, all others must bring data” - William Edwards Deming
  • How to measure ?• Apache: %D in mod_log_config• nginx: $request_time in HttpLogModule• use Benchmark;• tcprstat -• collectd -• metrics -• sysstat -
  • How to measure ? my ($date,$svctime) = (m/[(S+).+?s(d+)$/); $arrivalRate{$date}++; $serviceTimeAcc{$date} += $svctime;[02/Jul/2010:14:00:18... 1863 Time to serve the request, in μseconds.
  • use Chart::Clicker;
  • use Chart::Clicker;
  • use Chart::Clicker;
  • use Chart::Clicker; Average Hits/s = 65.142 Average Svc time = 0.0159
  • use Chart::Clicker; Average Hits/s = 65.142 Average Svc time = 0.0159
  • What to look for ?• Stretch factor• Method/Operation• Geolocation• Cookies • Use mod_logio to measure inbound traffic as well
  • ModelingPrediction is very difficult, especially if it’s aboutthe future. Niels BohrCapacity planning is about setting expectations.Even wrong expectations are better than noexpectations! Neil J. Gunther - The Guerrilla Manifesto
  • Modeling• A model is an abstraction of a complex system• A model allows us to observe phenomena that cannot be easily replicated
  • Modeling Methods• Statistics / Trending / Forecasting • Pros: • Easy to understand • Tools readily available • Cons: • Hard to create “What-if” scenarios • Hard to predict contention and bottlenecks
  • Modeling Methods• Queuing Analisys • Pros: • Allows you to make predictions when no production data is available • Allows you to create “What-if” scenarios • Cons: • Sometimes it can be unintuitive • The math behind it can be difficult
  • Queues as models Typical LAMP Stack ClientsRequests Replies Apache Application Database
  • Queues as models What if ? Clients Requests RepliesCache Apache Application Database
  • Queues as modelsWhat happens if we use a 15k RPM disk ? CPU Disk 10k RPM
  • Queues as models m1.small ? m1.large ? m1.xlarge ? Virtual Cores X EC2 CU Memory Bus
  • use pdq;• Available at (not on CPAN)• PDQ is a queuing circuit solver by Neil J. Gunther• There’s a whole book about it
  • use pdq;CreateNode() Define a queuing center Define a traffic stream of anCreateOpen() open circuit Define a traffic stream of aCreateClosed() closed circuit Define the service demand for SetDemand() each of the queuing centers
  • use pdq; Node TypesCEN Queuing CenterDLY Delay Center
  • use pdq; Service DisciplinesFCFS First-come first-servedLCFS Last-come first-servedISRV Infinite ServerPSHR Processor Sharing
  • use pdq;• Apache Web Server• Average Network RTD: 0.00921 seconds • Added as a delay center in the circuit• Average Arrival Rate: 65.142 hits/s• Average Service time: 0.0159 seconds• 128 worker threads
  • use pdq;$workload = "httpd";$httpMaxClient = 128;pdq::Init("web server");$arrivalRate = 65.142;$serviceTime = 0.1159;$pdq::streams = pdq::CreateOpen($workload, $arrivalRate);
  • pdq::Report();Metric Value Unit------ ----- ----Workload: "httpd"Number in system 8.0279 TransMean throughput 65.1420 Trans/SecResponse time 0.1232 SecStretch factor 1.0626
  • pdq::Report();Bounds Analysis:Max throughput 1104.4003 Trans/SecMin response 0.1160 Sec
  • pdq::Report();• Average request size: 145 KBytes • ~ 1160 Kbits• @1104 transactions / second: • 1,280,640 Kbits /s ~ 1.28 Gbps
  • Resources and References• CMG Public Proceedings:• Measure IT:• Guerrilla Capacity Planning Outlines/guerilla.html
  • Resources and References• Performance by Design - Menasce, Dowdy, Almeida -• Capacity Planning for Web Performance: Metrics, Models, and Methods - Daniel Menasce,Virgilio Almeida - lOATba• Capacity Planning for Web Services: Metrics, Models, and Methods - Daniel Menasce, Virgilio Almeida -
  • Resources and References• Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services - Neil Gunther -• The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling - R. K. Jain -
  • Any questions ?