Monitoring and tuning your chef server - chef conf talk

Monitoring and Tuning your Chef Server
Andrew DuFour and Nathan Cerny

Andrew DuFour
adufour@chef.io
Success Engineer
Chef Software
@andrewdufour
Nathan Cerny
ncerny@chef.io
Success Team Manager
Chef Software
@ndcerny

The Art of Monitoring
“There is no instance of a nation benefitting
from prolonged warfare.”
― Sun Tzu, The Art of War

Problem Statement
To make effective decisions and to
effectively respond to incidents, we must
have visibility into our systems.

Simplicity > Perfection
“Everything should be made as simple as
possible. But not simpler.”
― Albert Einstein

Continuous Improvement
kaizen
改善

Monitor Everything
Monitoring just your Chef Server is low
value.

The Science of Monitoring
“who wishes to fight must first count the
cost”
― Sun Tzu, The Art of War

What should you monitor?
Supporting Services
RabbitMQ
Solr
PostgreSQL
Application Services
Bifrost
Application Logs

Tools 101
• StatsD – A network daemon that runs on the Node.js platform and listens for
statistics, like counters and timers.
https://github.com/etsy/statsd
• Grafana - Beautiful dashboards
• TICK Stack – A series of tools that comprise the ‘Influx Data Platform’, including
an easily scalable time series database.
https://influxdata.com/time-series-platform/
• Sensu - Monitoring that doesn't suck.
https://sensuapp.org/
• Splunk – centralized logging, operational intelligence, big machine data tool
http://www.splunk.com/

Instrumenting our Erlang Based Services
Bifrost

Instrumenting our Erlang Based Services - StatsHero
• Example metrics emitted in Statsd format:
test_hero.upstreamRequests.rdbms:1200|h
• Enabling StatsHero in your chef-server.rb:
Estatsd[‘enabled’] = true
Estatsd[‘protocol’] = ‘stastd’
Estatsd[‘vip’] = ‘<statsd server>’
Estatsd[‘port’] = ‘<statsd port>’
Namespace Category Metric Measurement
Metric Type (H=histogram)

Instrumenting our Erlang Based Services
Bifrost
Graphite

Instrumenting our Erlang Based Services - Folsom
Metrics
• Example metrics:
pooler.chef_depsolver.in_use_count
pooler.chef_depsolver.free_count
pooler.sqerl.in_use_count
pooler.sqerl.free_count
• Enabling folsom metrics in your chef-server.rb
folsom_graphite['enabled'] = true
folsom_graphite[‘host’] = ‘<your graphite host>’
folsom_graphite[‘port’] = ‘<your graphite port>’

Instrumenting our Erlang Based Services – Collecting
Logs
• Use a full featured log collector like Splunk to centralize logs.
• All of our services log into a common directory structure:
/var/log/opscode/<service name>
• The two most important files within that directory are:
current
error
• There are also request logs which repeat information available elsewhere
• All services shipped with the omnibus package, not just Erlang services, log
here

Client Side Tuning
USE THE SPLAY, LUKE!

Sometimes Ohai tuning is needed
(e.g.. Centrify)
ALWAYS USE PARTIAL SEARCH!
(and look at SafeSearch)
Know what a dependency graph is
… and manage it.

Chef-server.rb
• https://docs.chef.io/config_rb_server.html
• https://docs.chef.io/config_rb_server_optional_settings.html
• https://github.com/chef/chef-server/blob/master/omnibus/files/private-chef-
cookbooks/private-chef/attributes/default.rb
• How does chef-server.rb work?
The Chef servers’ reconfigure is driven by a cookbook called PrivateChef.
PrivateChef is a cookbook that’s just like any other - with some helper libraries to read your
chef-server.rb, and make sense of it
• Actually tuning a setting:
opscode_erchef[‘db_pool_size’] = “20”

A quick look at PrivateChef
You can see, we’re creating a new
Module called PrivateChef.
The Configuration attributes are
defined as new Mashes. When you say
opscode_erchef[‘key’] = value, you’re
truly just assigning a value to the Mash
created in the PrivateChef module.

Looking at the Low
Hanging Fruit

Bifrost
Erchef
Nginx
Enable cookbook cache
S3 URL Expiry
Bifrost
Db pooler timeout
Db pooler queue size
Authz
Db pool size
Authz
Initial Pool Count
Max Pool Count
Max Queue Size

Bifrost
Erchef
Nginx
Depsolver workers
Depsolver timeout
Authz
Db pooler timeout
Db pooler queue size
Db pool size
Keygen_cache_size

RabbitMQ
PostgreSQL
PostgreSQL
Checkpoint Segments
Checkpoint completion target
Log min duration statement
Solr
Heap size
New size
RabbitMQ
Analytics max length
Dark launch
Max connections

Helpful Links
• https://sensuapp.org/
• https://github.com/sensu-plugins/sensu-plugins-postgres
• https://github.com/sensu-plugins/sensu-plugins-rabbitmq
• https://github.com/sensu-plugins/sensu-plugins-solr
• https://github.com/sensu-plugins/sensu-plugins-nginx
• https://github.com/sensu-plugins/sensu-plugins-filesystem-checks
Sensu:
Statsd: https://github.com/etsy/statsd
InfluxDB: https://influxdata.com/
Splunk: http://www.splunk.com/

More Useful Tools
• PGBadger - https://github.com/dalibo/pgbadger
• Monitor Postgresql: https://wiki.postgresql.org/wiki/Monitoring
• How to Monitor Nginx: https://www.scalyr.com/community/guides/how-to-
monitor-nginx-the-essential-guide
• Pgtune - http://pgfoundry.org/projects/pgtune
pgtune takes the wimpy default postgresql.conf and expands the database server to be as
powerful as the hardware it's being deployed on
Be careful about shared resources, Pgtune assumes you have a dedicated Postgres server.
• GCViewer
Helps you analyze your GC activity, so you can make decisiosn on tuning.
http://www.tagtraum.com/gcviewer.html

Alternatives Tools
• ELK: https://www.elastic.co/webinars/introduction-elk-stack
• Graylog: https://www.graylog.org/
• Loggly: https://www.loggly.com/
• Graphite: https://github.com/graphite-project/
• Datadog - https://www.datadoghq.com/
• So many more….

Special Thanks
• Irving Popovetsky and his tuning the chef server for scale blog:
http://irvingpop.github.io/blog/2015/04/20/tuning-the-chef-server-for-scale/
• Mark Harrison, Paul Mooring and the Chef server team. The dashboards are
heavily based on their dashboards for hosted Chef.
• Phil Dibowitz and Facebook for teaching Andrew a lot about tuning the Chef
server for scale that almost none of our other customers hit.

Live Demo
• Link to github: https://github.com/andy-dufour/chef-server-
monitoring/

Monitoring and tuning your chef server - chef conf talk

Monitoring and tuning your chef server - chef conf talk

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Monitoring and tuning your chef server - chef conf talk

Similar to Monitoring and tuning your chef server - chef conf talk (20)

Recently uploaded

Recently uploaded (20)

Monitoring and tuning your chef server - chef conf talk

Editor's Notes