Capacity Planning For LAMP

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    6 Favorites

    Capacity Planning For LAMP - Presentation Transcript

    1. capacity planning for LAMP what happens after you’re scalable MySQL Conf and Expo April 2007
    2. John Allspaw • Engineering Manager (Operations) at flickr (Yahoo!) • •
    3. Yay! • You’re scalable! (or not) • Now you can simply add hardware as you need capacity. • (right ?)
    4. • But: • How many servers ?
    5. BUT, um, wait.... • How many databases ? • How many webservers ? • How much shared storage ? • How many network switches ? • What about caching ? • How many CPUs in all of these ? • How much RAM ? • How many drives in each ? • WHEN should we order all of these ?
    6. some stats • - ~35M photos in squid cache (total) • - ~2M photos in squid’s RAM • - ~470M photos, 4 or 5 sizes of each • - 38k req/sec to memcached (12M objects) • - 2 PB raw storage (consumed about ~1.5TB on Sunday) •
    7. capacity
    8. capacity doesn’t mean speed
    9. capacity is for business
    10. too much Buying enough for now enough not too soon too late
    11. 3 main parts • - Planning (what ?/why ?/when ?) • - Deployment (install/config/manage) • - Measurement (graph the world)
    12. boring queueing theory • Forced Flow Law: • X =Vi i x X0 Little’s Law: N=XxR Service Demand Law: Di = Vi x Si = Ui / X0 •
    13. my theory • capacity planning math is based on real things, not abstract ones.
    14. predicting the future
    15. consumable
    16. concurrent usage
    17. considerations: social applications • - Have the ‘network effect’ • - Exponential growth • •
    18. considerations: social applications • Event-related growth • (press, news event, social trends, etc.) • Examples: • London bombing, holidays, tsunamis, etc. • •
    19. What do you have NOW ? • When will your current capacity be depleted or outgrown ?
    20. finding ceilings • MySQL (disk IO ?) • SQUID (disk IO ? or CPU ?) • memcached (CPU ? or network ?)
    21. forget benchmarks • boring • to use in capacity planning...not usually worth the time • not representative of real load
    22. • test in production
    23. what do you expect ? • define what is acceptable • examples: • squid hits should take less than X milliseconds • SQL queries less than Y milliseconds, and also keep up with replication
    24. measurement
    25. accept the observer effect • measurement is a necessity. • it’s not optional.
    26. http://ganglia.sf.net
    27. gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www 1 2 3 xml over UDP on 239.2.11.83 (multicast)
    28. gmetad db1 db2 db3 XML over TCP xml over UDP on 239.2.11.84 (multicast) www www www boom! 1 2 3 xml over UDP on 239.2.11.83 (multicast)
    29. super simple graphing • #!/bin/sh • /usr/bin/iostat -x 4 2 sda | grep -v ^$ | tail -4 > /tmp/ disk-io.tmp • UTIL=`grep sda /tmp/disk-io.tmp | awk '{print $14}'` • /usr/bin/gmetric -t uint16 -n disk-util -v$UTIL -u '%'
    30. memcached
    31. what if you have graphs but no raw data ? • GraphClick • http://www.arizona-software.ch/ applications/graphclick/en/ •
    32. application usage • Usage stats are just as important • as server stats! • Examples: • # of user registrations • # of photos uploaded every hour
    33. not a straight line
    34. another not straight line
    35. but straight relationships!
    36. measurement examples
    37. queries
    38. disk I/O
    39. What we know now • we can do at least 1500 qps (peak) without: - slave lag - unacceptable avg response time - waiting on disk IO
    40. MySQL capacity 1. find ceilings of existing h/w 2. tie app usage to server stats 3. find ceiling:usage ratio 4. do this again: - regularly (monthly) - when new features are released - when new h/w is deployed
    41. caching maximums
    42. caching ceilings squid, memcache • working-set specific: • - tiny enough to all fit in memory ? • - some/more/all on disk ? • - watch LRU churn
    43. churning full caches • Ceilings at: • - LRU ref age small enough to affect hit ratio too much • - Request rate large enough to affect disk IO (to 100%)
    44. squid requests and hits
    45. squid hit ratio
    46. LRU reference age
    47. hit response times
    48. What we know now • we can do at least 620 req/sec (peak) without: - LRU affecting hit ratio - unacceptable avg response time - waiting too much on diskIO
    49. not full caches • (working set smaller than max size) • - request rate large enough to bring network or CPU to 100%
    50. deployment
    51. Automated Deploy Tools •SystemImager/SystemConfigurator •- http://wiki.systemimager.org • CVSup: • - http://www.cvsup.org • Subcon: • - http://code.google.com/p/subcon/ •
    52. questions ? •http://flickr.com/photos/gaspi/62165296/ •http://flickr.com/photos/marksetchell/27964330/ •http://flickr.com/photos/sheeshoo/72709413/ •http://flickr.com/photos/jaxxon/165559708/ •http://flickr.com/photos/bambooly/298632541/ •http://flickr.com/photos/colloidfarl/81564759/ •http://flickr.com/photos/sparktography/75499095/

    + John AllspawJohn Allspaw, 3 months ago

    custom

    708 views, 6 favs, 0 embeds more stats

    Presented at the MySQL User's Conference in 2007.

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 708
      • 708 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 6
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories