• Save
Capacity Planning For LAMP
Upcoming SlideShare
Loading in...5
×
 

Capacity Planning For LAMP

on

  • 5,941 views

Presented at the MySQL User's Conference in 2007.

Presented at the MySQL User's Conference in 2007.

Statistics

Views

Total Views
5,941
Views on SlideShare
5,921
Embed Views
20

Actions

Likes
25
Downloads
0
Comments
1

4 Embeds 20

http://www.linkedin.com 9
http://www.slideshare.net 8
https://www.linkedin.com 2
http://www.lmodules.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Thanks for the photo cred.
    -jaxxon
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Capacity Planning For LAMP Capacity Planning For LAMP Presentation Transcript

    • capacity planning for LAMP what happens after you’re scalable MySQL Conf and Expo April 2007
    • John Allspaw • Engineering Manager (Operations) at flickr (Yahoo!) • •
    • Yay! • You’re scalable! (or not) • Now you can simply add hardware as you need capacity. • (right ?)
    • • But: • How many servers ?
    • BUT, um, wait.... • How many databases ? • How many webservers ? • How much shared storage ? • How many network switches ? • What about caching ? • How many CPUs in all of these ? • How much RAM ? • How many drives in each ? • WHEN should we order all of these ?
    • some stats • - ~35M photos in squid cache (total) • - ~2M photos in squid’s RAM • - ~470M photos, 4 or 5 sizes of each • - 38k req/sec to memcached (12M objects) • - 2 PB raw storage (consumed about ~1.5TB on Sunday) •
    • capacity
    • capacity doesn’t mean speed
    • capacity is for business
    • too much Buying enough for now enough not too soon too late
    • 3 main parts • - Planning (what ?/why ?/when ?) • - Deployment (install/config/manage) • - Measurement (graph the world)
    • boring queueing theory • Forced Flow Law: • X =Vi i x X0 Little’s Law: N=XxR Service Demand Law: Di = Vi x Si = Ui / X0 •
    • my theory • capacity planning math is based on real things, not abstract ones.
    • predicting the future
    • consumable
    • concurrent usage
    • considerations: social applications • - Have the ‘network effect’ • - Exponential growth • •
    • considerations: social applications • Event-related growth • (press, news event, social trends, etc.) • Examples: • London bombing, holidays, tsunamis, etc. • •
    • What do you have NOW ? • When will your current capacity be depleted or outgrown ?
    • finding ceilings • MySQL (disk IO ?) • SQUID (disk IO ? or CPU ?) • memcached (CPU ? or network ?)
    • forget benchmarks • boring • to use in capacity planning...not usually worth the time • not representative of real load
    • • test in production
    • what do you expect ? • define what is acceptable • examples: • squid hits should take less than X milliseconds • SQL queries less than Y milliseconds, and also keep up with replication
    • measurement
    • accept the observer effect • measurement is a necessity. • it’s not optional.
    • http://ganglia.sf.net
    • gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www 1 2 3 xml over UDP on 239.2.11.83 (multicast)
    • gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www boom! 1 2 3 xml over UDP on 239.2.11.83 (multicast)
    • super simple graphing • #!/bin/sh • /usr/bin/iostat -x 4 2 sda | grep -v ^$ | tail -4 > /tmp/ disk-io.tmp • UTIL=`grep sda /tmp/disk-io.tmp | awk '{print $14}'` • /usr/bin/gmetric -t uint16 -n disk-util -v$UTIL -u '%'
    • memcached
    • what if you have graphs but no raw data ? • GraphClick • http://www.arizona-software.ch/ applications/graphclick/en/ •
    • application usage • Usage stats are just as important • as server stats! • Examples: • # of user registrations • # of photos uploaded every hour
    • not a straight line
    • another not straight line
    • but straight relationships!
    • measurement examples
    • queries
    • disk I/O
    • What we know now • we can do at least 1500 qps (peak) without: - slave lag - unacceptable avg response time - waiting on disk IO
    • MySQL capacity 1. find ceilings of existing h/w 2. tie app usage to server stats 3. find ceiling:usage ratio 4. do this again: - regularly (monthly) - when new features are released - when new h/w is deployed
    • caching maximums
    • caching ceilings squid, memcache • working-set specific: • - tiny enough to all fit in memory ? • - some/more/all on disk ? • - watch LRU churn
    • churning full caches • Ceilings at: • - LRU ref age small enough to affect hit ratio too much • - Request rate large enough to affect disk IO (to 100%)
    • squid requests and hits
    • squid hit ratio
    • LRU reference age
    • hit response times
    • What we know now • we can do at least 620 req/sec (peak) without: - LRU affecting hit ratio - unacceptable avg response time - waiting too much on diskIO
    • not full caches • (working set smaller than max size) • - request rate large enough to bring network or CPU to 100%
    • deployment
    • Automated Deploy Tools •SystemImager/SystemConfigurator •- http://wiki.systemimager.org • CVSup: • - http://www.cvsup.org • Subcon: • - http://code.google.com/p/subcon/ •
    • questions ? •http://flickr.com/photos/gaspi/62165296/ •http://flickr.com/photos/marksetchell/27964330/ •http://flickr.com/photos/sheeshoo/72709413/ •http://flickr.com/photos/jaxxon/165559708/ •http://flickr.com/photos/bambooly/298632541/ •http://flickr.com/photos/colloidfarl/81564759/ •http://flickr.com/photos/sparktography/75499095/