Capacity Planning for fun & profit

3,112 views
2,992 views

Published on

Capacity Planning for fun & profit, as presented in the 2nd São Paulo Perl Mongers Conference

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,112
On SlideShare
0
From Embeds
0
Number of Embeds
162
Actions
Shares
0
Downloads
48
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Capacity Planning for fun & profit

  1. 1. Capacity Planningfor fun & profit beyond cacti and top II São Paulo Perl Workshop Rodrigo Albani de Campos - @xinu camposr@gmail.com
  2. 2. Agenda• Capacity planning primer: a tale of discovery• Metrics• Queues• Models
  3. 3. Why Perl ?• Main reason: I feel comfortable with it• Ubiquitous and free• Plenty of stable statistics modules available at CPAN• Ultimately, it gets the job done
  4. 4. Capacity Planning• Is just like sex... • Everyone wants to do it • Many say they’re doing it • You always exaggerate how much of it you’re doing • Most people aren’t actually doing it (despite their best efforts) • Everybody else seems to be doing more than you
  5. 5. A tale of discoveryThere once was a system administrator...
  6. 6. A tale of discovery How many ? Actual capacity ? Servers do we need ? How much memory ? What’s the predicted growth ? IO Capacity ?
  7. 7. Typical Performance Metrics• Load Average - uptime • The single most misunderstood metric• CPU - mpstat• IO - iostat• Memory Usage - vmstat
  8. 8. Typical Performance Metrics
  9. 9. Time series charts I’m looking at you cacti huggers !• Time series performance data is useful for: • Troubleshooting • Simplistic forecasting • Find trends • Identify seasonal behavior• This left alone is NOT Capacity Planning
  10. 10. Frustration• Computer systems can be harsh• Most systems will not scale linearly• Diminishing returns and lock contention will punch you in the face• “Oh but I’ve checked cacti and the CPU was 25% idle”
  11. 11. Let’s put it in the Cloud• We are moving back to an utility computing model• You’re charged per usage• Even more important to care about capacity planning !!!
  12. 12. Call the experts • Cost per MIPS • IBM System/370 model 158-3 - 1.0 MIPS @ 1.0 MHz -1972 • Average purchase price: $ 771,000* • No disks or peripherals included • $ 4,082,039 by 2011 • Need to squeeze every drop of processing power* Source: http://www-03.ibm.com/ibm/history/exhibits/mainframe/mainframe_PP3135.html
  13. 13. QueuesThe not so typical performance metrics• 1961 - CTSS was first demonstrated at MIT• 1965 - Allan Scherr used machine repairman problem to model a time-shared system as part of Project MAC• Another offspring of Project MAC is Multics
  14. 14. QueuesThe not so typical performance metrics Computer System Disks CPU
  15. 15. QueuesThe not so typical performance metrics (A) λ X (C) SOpen/Closed W Network R A Arrival Count λ Arrival Rate (A/T) W Time spent in Queue R Residence Time (W+S) S Service Time X System Throughput (C/T) C Completed tasks count
  16. 16. Arrival Rate (λ)• Pretty straightforward• Requests per second/hour/day• Not the same as throughput (X) • Although in a steady state: • A = C as T →∞ • λ=X
  17. 17. Service Time (S)• Time spent in processing • Web server response time • Total query time • IO operation time length
  18. 18. !"#$%&"(%)"*+, Mythical Performance !#)" !#(" !#" • Not gonna happen...!"#$%&"(%)"*+, !#&" *+,-./+"0.1+234" • Don’t believe vendor’s sales pitch !#%" • “In God we trust, all others must bring data” - William Edwards Deming !#$" !" !" (" $!" $(" %!" %(" &!" &(" !" (" -##%$./0.1"*2%1+3+,
  19. 19. Mythical Performance• Not gonna happen...• Don’t believe vendor’s sales pitch• “In God we trust, all others must bring data” - William Edwards Deming
  20. 20. How to measure ?• Apache: %D in mod_log_config• nginx: $request_time in HttpLogModule• use Benchmark;• tcprstat - http://goo.gl/0cbYx• collectd - http://goo.gl/OXKG7• metrics - http://goo.gl/gQFVM• sysstat - http://goo.gl/2aLul
  21. 21. How to measure ? my ($date,$svctime) = (m/[(S+).+?s(d+)$/); $arrivalRate{$date}++; $serviceTimeAcc{$date} += $svctime;[02/Jul/2010:14:00:18... 1863 Time to serve the request, in μseconds.
  22. 22. use Chart::Clicker;
  23. 23. use Chart::Clicker;
  24. 24. use Chart::Clicker;
  25. 25. use Chart::Clicker; Average Hits/s = 65.142 Average Svc time = 0.0159
  26. 26. use Chart::Clicker; Average Hits/s = 65.142 Average Svc time = 0.0159
  27. 27. What to look for ?• Stretch factor• Method/Operation• Geolocation• Cookies • Use mod_logio to measure inbound traffic as well
  28. 28. ModelingPrediction is very difficult, especially if it’s aboutthe future. Niels BohrCapacity planning is about setting expectations.Even wrong expectations are better than noexpectations! Neil J. Gunther - The Guerrilla Manifesto http://goo.gl/lZKWH
  29. 29. Modeling• A model is an abstraction of a complex system• A model allows us to observe phenomena that cannot be easily replicated
  30. 30. Modeling Methods• Statistics / Trending / Forecasting • Pros: • Easy to understand • Tools readily available • Cons: • Hard to create “What-if” scenarios • Hard to predict contention and bottlenecks
  31. 31. Modeling Methods• Queuing Analisys • Pros: • Allows you to make predictions when no production data is available • Allows you to create “What-if” scenarios • Cons: • Sometimes it can be unintuitive • The math behind it can be difficult
  32. 32. Queues as models Typical LAMP Stack ClientsRequests Replies Apache Application Database
  33. 33. Queues as models What if ? Clients Requests RepliesCache Apache Application Database
  34. 34. Queues as modelsWhat happens if we use a 15k RPM disk ? CPU Disk 10k RPM
  35. 35. Queues as models m1.small ? m1.large ? m1.xlarge ? Virtual Cores X EC2 CU Memory Bus
  36. 36. use pdq;• Available at http://goo.gl/s98wQ (not on CPAN)• PDQ is a queuing circuit solver by Neil J. Gunther• There’s a whole book about it http://goo.gl/9MA2c
  37. 37. use pdq;CreateNode() Define a queuing center Define a traffic stream of anCreateOpen() open circuit Define a traffic stream of aCreateClosed() closed circuit Define the service demand for SetDemand() each of the queuing centers
  38. 38. use pdq; Node TypesCEN Queuing CenterDLY Delay Center
  39. 39. use pdq; Service DisciplinesFCFS First-come first-servedLCFS Last-come first-servedISRV Infinite ServerPSHR Processor Sharing
  40. 40. use pdq;• Apache Web Server• Average Network RTD: 0.00921 seconds • Added as a delay center in the circuit• Average Arrival Rate: 65.142 hits/s• Average Service time: 0.0159 seconds• 128 worker threads
  41. 41. use pdq;$workload = "httpd";$httpMaxClient = 128;pdq::Init("web server");$arrivalRate = 65.142;$serviceTime = 0.1159;$pdq::streams = pdq::CreateOpen($workload, $arrivalRate);
  42. 42. pdq::Report();Metric Value Unit------ ----- ----Workload: "httpd"Number in system 8.0279 TransMean throughput 65.1420 Trans/SecResponse time 0.1232 SecStretch factor 1.0626
  43. 43. pdq::Report();Bounds Analysis:Max throughput 1104.4003 Trans/SecMin response 0.1160 Sec
  44. 44. pdq::Report();• Average request size: 145 KBytes • ~ 1160 Kbits• @1104 transactions / second: • 1,280,640 Kbits /s ~ 1.28 Gbps
  45. 45. Resources and References• CMG Public Proceedings: http://www.cmg.org/proceedings/• Measure IT: http://www.cmg.org/measureit/• Guerrilla Capacity Planning http://www.perfdynamics.com/Classes/ Outlines/guerilla.html
  46. 46. Resources and References• Performance by Design - Menasce, Dowdy, Almeida - http://amzn.to/mpqfVO• Capacity Planning for Web Performance: Metrics, Models, and Methods - Daniel Menasce,Virgilio Almeida - http://amzn.to/ lOATba• Capacity Planning for Web Services: Metrics, Models, and Methods - Daniel Menasce, Virgilio Almeida - http://amzn.to/iClpsB
  47. 47. Resources and References• Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services - Neil Gunther - http://amzn.to/kfrfLK• The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling - R. K. Jain - http://amzn.to/jqud1I
  48. 48. Any questions ?

×