Capacity Planning For LAMP

5,177 views

Published on

Presented at the MySQL User's Conference in 2007.

Published in: Technology
1 Comment
29 Likes
Statistics
Notes
No Downloads
Views
Total views
5,177
On SlideShare
0
From Embeds
0
Number of Embeds
53
Actions
Shares
0
Downloads
0
Comments
1
Likes
29
Embeds 0
No embeds

No notes for slide

Capacity Planning For LAMP

  1. 1. capacity planning for LAMP what happens after you’re scalable MySQL Conf and Expo April 2007
  2. 2. John Allspaw • Engineering Manager (Operations) at flickr (Yahoo!) • •
  3. 3. Yay! • You’re scalable! (or not) • Now you can simply add hardware as you need capacity. • (right ?)
  4. 4. • But: • How many servers ?
  5. 5. BUT, um, wait.... • How many databases ? • How many webservers ? • How much shared storage ? • How many network switches ? • What about caching ? • How many CPUs in all of these ? • How much RAM ? • How many drives in each ? • WHEN should we order all of these ?
  6. 6. some stats • - ~35M photos in squid cache (total) • - ~2M photos in squid’s RAM • - ~470M photos, 4 or 5 sizes of each • - 38k req/sec to memcached (12M objects) • - 2 PB raw storage (consumed about ~1.5TB on Sunday) •
  7. 7. capacity
  8. 8. capacity doesn’t mean speed
  9. 9. capacity is for business
  10. 10. too much Buying enough for now enough not too soon too late
  11. 11. 3 main parts • - Planning (what ?/why ?/when ?) • - Deployment (install/config/manage) • - Measurement (graph the world)
  12. 12. boring queueing theory • Forced Flow Law: • X =Vi i x X0 Little’s Law: N=XxR Service Demand Law: Di = Vi x Si = Ui / X0 •
  13. 13. my theory • capacity planning math is based on real things, not abstract ones.
  14. 14. predicting the future
  15. 15. consumable
  16. 16. concurrent usage
  17. 17. considerations: social applications • - Have the ‘network effect’ • - Exponential growth • •
  18. 18. considerations: social applications • Event-related growth • (press, news event, social trends, etc.) • Examples: • London bombing, holidays, tsunamis, etc. • •
  19. 19. What do you have NOW ? • When will your current capacity be depleted or outgrown ?
  20. 20. finding ceilings • MySQL (disk IO ?) • SQUID (disk IO ? or CPU ?) • memcached (CPU ? or network ?)
  21. 21. forget benchmarks • boring • to use in capacity planning...not usually worth the time • not representative of real load
  22. 22. • test in production
  23. 23. what do you expect ? • define what is acceptable • examples: • squid hits should take less than X milliseconds • SQL queries less than Y milliseconds, and also keep up with replication
  24. 24. measurement
  25. 25. accept the observer effect • measurement is a necessity. • it’s not optional.
  26. 26. http://ganglia.sf.net
  27. 27. gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www 1 2 3 xml over UDP on 239.2.11.83 (multicast)
  28. 28. gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www boom! 1 2 3 xml over UDP on 239.2.11.83 (multicast)
  29. 29. super simple graphing • #!/bin/sh • /usr/bin/iostat -x 4 2 sda | grep -v ^$ | tail -4 > /tmp/ disk-io.tmp • UTIL=`grep sda /tmp/disk-io.tmp | awk '{print $14}'` • /usr/bin/gmetric -t uint16 -n disk-util -v$UTIL -u '%'
  30. 30. memcached
  31. 31. what if you have graphs but no raw data ? • GraphClick • http://www.arizona-software.ch/ applications/graphclick/en/ •
  32. 32. application usage • Usage stats are just as important • as server stats! • Examples: • # of user registrations • # of photos uploaded every hour
  33. 33. not a straight line
  34. 34. another not straight line
  35. 35. but straight relationships!
  36. 36. measurement examples
  37. 37. queries
  38. 38. disk I/O
  39. 39. What we know now • we can do at least 1500 qps (peak) without: - slave lag - unacceptable avg response time - waiting on disk IO
  40. 40. MySQL capacity 1. find ceilings of existing h/w 2. tie app usage to server stats 3. find ceiling:usage ratio 4. do this again: - regularly (monthly) - when new features are released - when new h/w is deployed
  41. 41. caching maximums
  42. 42. caching ceilings squid, memcache • working-set specific: • - tiny enough to all fit in memory ? • - some/more/all on disk ? • - watch LRU churn
  43. 43. churning full caches • Ceilings at: • - LRU ref age small enough to affect hit ratio too much • - Request rate large enough to affect disk IO (to 100%)
  44. 44. squid requests and hits
  45. 45. squid hit ratio
  46. 46. LRU reference age
  47. 47. hit response times
  48. 48. What we know now • we can do at least 620 req/sec (peak) without: - LRU affecting hit ratio - unacceptable avg response time - waiting too much on diskIO
  49. 49. not full caches • (working set smaller than max size) • - request rate large enough to bring network or CPU to 100%
  50. 50. deployment
  51. 51. Automated Deploy Tools •SystemImager/SystemConfigurator •- http://wiki.systemimager.org • CVSup: • - http://www.cvsup.org • Subcon: • - http://code.google.com/p/subcon/ •
  52. 52. questions ? •http://flickr.com/photos/gaspi/62165296/ •http://flickr.com/photos/marksetchell/27964330/ •http://flickr.com/photos/sheeshoo/72709413/ •http://flickr.com/photos/jaxxon/165559708/ •http://flickr.com/photos/bambooly/298632541/ •http://flickr.com/photos/colloidfarl/81564759/ •http://flickr.com/photos/sparktography/75499095/

×