Evan Gilman presented on optimizing Chef resources at PagerDuty that had led to bloated Chef runs. As PagerDuty grew, Chef runs experienced CPU spikes, long run times, and out of memory errors. To address this, they measured Chef run times and resource usage, identified inefficiencies like full node searches and redundant API calls, and implemented optimizations like partial searches, result memoization, and centralized API access. These changes reduced Chef run memory usage by 88% and time by 84%, allowing faster convergence and less resource intensive runs.
3. 4/3/15
Agenda
BLOATED CHEFS
1. Chef resources in use at PD
2. Problems encountered as we grew
3. Measuring chef-client run
4. How we fixed it
5. How fast is it now?
14. 4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
15. 4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
• chef-client run took several minutes
16. 4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
• chef-client run took several minutes
• chef-client OOM
17. 4/3/15
As we grew…
BLOATED CHEFS
• CPU spikes during chef-client runs
• Awkward pauses at the beginning of the run
• chef-client run took several minutes
• chef-client OOM
50. 4/3/15
Other Nasties
BLOATED CHEFS
• Too many conditional guards
• tmpfs storage
• Multiple package resources (Chef 12)
Six seconds for twelve packages