SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Things that can cause downtime
bugs (disguised as capacity problems) edge cases (disguised as capacity problems) security incidents real capacity problems* * (should be the last thing you need to worry about)
Thank You HPC Industry! Automated
Stuff Scalable Metric Collection/Display a lot of great deployment and management tricks come from them, adopted by web ops
Clouds need planning too Makes
deployment and procurement easy and quick But clouds are still resources with costs and limits, just like your own stuff Black-boxes: you may need to pay even more attention than before
Diagonal Scaling example: image processing
throughput ~45 images/min @ peak ~140 images/min @ peak (same CPU usage, but ~3x more work) “processing” means making 4 sizes from originals
Stupid Capacity Tricks quick and
dirty management [root@netmon101 ~]# dsh -N group.of.servers dsh> date executing 'date' www100: Mon Jun 23 14:14:53 UTC 2008 www118: Mon Jun 23 14:14:53 UTC 2008 dbcontacts3: Mon Jun 23 07:14:53 PDT 2008 admin1: Mon Jun 23 14:14:53 UTC 2008 admin2: Mon Jun 23 14:14:53 UTC 2008 dsh>
Stupid Capacity Tricks Turn Stuff
OFF Disable heavy-ish features of the site (on/off switches) We have 195 different things to disable in case of emergency.
Stupid Capacity Tricks Turn Stuff
OFF uploads (photo) uploads (video) uploads by email various API things various mobile things various search things etc., etc.
Stupid Capacity Tricks Outages Happen
Host your outage/status/blog page in more than one datacenter. Tell your users WTF is going on, they’ll appreciate it.
Stupid Capacity Tricks Hit the
Pause Button Bake the dynamic into static Some Y! properties have a big red button to instantly bake (and un- bake) at will