Capacity Planning For LAMP

  • 4,081 views
Uploaded on

Presented at the MySQL User's Conference in 2007.

Presented at the MySQL User's Conference in 2007.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Thanks for the photo cred.
    -jaxxon
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
4,081
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
1
Likes
26

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. capacity planning for LAMP what happens after you’re scalable MySQL Conf and Expo April 2007
  • 2. John Allspaw • Engineering Manager (Operations) at flickr (Yahoo!) • •
  • 3. Yay! • You’re scalable! (or not) • Now you can simply add hardware as you need capacity. • (right ?)
  • 4. • But: • How many servers ?
  • 5. BUT, um, wait.... • How many databases ? • How many webservers ? • How much shared storage ? • How many network switches ? • What about caching ? • How many CPUs in all of these ? • How much RAM ? • How many drives in each ? • WHEN should we order all of these ?
  • 6. some stats • - ~35M photos in squid cache (total) • - ~2M photos in squid’s RAM • - ~470M photos, 4 or 5 sizes of each • - 38k req/sec to memcached (12M objects) • - 2 PB raw storage (consumed about ~1.5TB on Sunday) •
  • 7. capacity
  • 8. capacity doesn’t mean speed
  • 9. capacity is for business
  • 10. too much Buying enough for now enough not too soon too late
  • 11. 3 main parts • - Planning (what ?/why ?/when ?) • - Deployment (install/config/manage) • - Measurement (graph the world)
  • 12. boring queueing theory • Forced Flow Law: • X =Vi i x X0 Little’s Law: N=XxR Service Demand Law: Di = Vi x Si = Ui / X0 •
  • 13. my theory • capacity planning math is based on real things, not abstract ones.
  • 14. predicting the future
  • 15. consumable
  • 16. concurrent usage
  • 17. considerations: social applications • - Have the ‘network effect’ • - Exponential growth • •
  • 18. considerations: social applications • Event-related growth • (press, news event, social trends, etc.) • Examples: • London bombing, holidays, tsunamis, etc. • •
  • 19. What do you have NOW ? • When will your current capacity be depleted or outgrown ?
  • 20. finding ceilings • MySQL (disk IO ?) • SQUID (disk IO ? or CPU ?) • memcached (CPU ? or network ?)
  • 21. forget benchmarks • boring • to use in capacity planning...not usually worth the time • not representative of real load
  • 22. • test in production
  • 23. what do you expect ? • define what is acceptable • examples: • squid hits should take less than X milliseconds • SQL queries less than Y milliseconds, and also keep up with replication
  • 24. measurement
  • 25. accept the observer effect • measurement is a necessity. • it’s not optional.
  • 26. http://ganglia.sf.net
  • 27. gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www 1 2 3 xml over UDP on 239.2.11.83 (multicast)
  • 28. gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www boom! 1 2 3 xml over UDP on 239.2.11.83 (multicast)
  • 29. super simple graphing • #!/bin/sh • /usr/bin/iostat -x 4 2 sda | grep -v ^$ | tail -4 > /tmp/ disk-io.tmp • UTIL=`grep sda /tmp/disk-io.tmp | awk '{print $14}'` • /usr/bin/gmetric -t uint16 -n disk-util -v$UTIL -u '%'
  • 30. memcached
  • 31. what if you have graphs but no raw data ? • GraphClick • http://www.arizona-software.ch/ applications/graphclick/en/ •
  • 32. application usage • Usage stats are just as important • as server stats! • Examples: • # of user registrations • # of photos uploaded every hour
  • 33. not a straight line
  • 34. another not straight line
  • 35. but straight relationships!
  • 36. measurement examples
  • 37. queries
  • 38. disk I/O
  • 39. What we know now • we can do at least 1500 qps (peak) without: - slave lag - unacceptable avg response time - waiting on disk IO
  • 40. MySQL capacity 1. find ceilings of existing h/w 2. tie app usage to server stats 3. find ceiling:usage ratio 4. do this again: - regularly (monthly) - when new features are released - when new h/w is deployed
  • 41. caching maximums
  • 42. caching ceilings squid, memcache • working-set specific: • - tiny enough to all fit in memory ? • - some/more/all on disk ? • - watch LRU churn
  • 43. churning full caches • Ceilings at: • - LRU ref age small enough to affect hit ratio too much • - Request rate large enough to affect disk IO (to 100%)
  • 44. squid requests and hits
  • 45. squid hit ratio
  • 46. LRU reference age
  • 47. hit response times
  • 48. What we know now • we can do at least 620 req/sec (peak) without: - LRU affecting hit ratio - unacceptable avg response time - waiting too much on diskIO
  • 49. not full caches • (working set smaller than max size) • - request rate large enough to bring network or CPU to 100%
  • 50. deployment
  • 51. Automated Deploy Tools •SystemImager/SystemConfigurator •- http://wiki.systemimager.org • CVSup: • - http://www.cvsup.org • Subcon: • - http://code.google.com/p/subcon/ •
  • 52. questions ? •http://flickr.com/photos/gaspi/62165296/ •http://flickr.com/photos/marksetchell/27964330/ •http://flickr.com/photos/sheeshoo/72709413/ •http://flickr.com/photos/jaxxon/165559708/ •http://flickr.com/photos/bambooly/298632541/ •http://flickr.com/photos/colloidfarl/81564759/ •http://flickr.com/photos/sparktography/75499095/