Etsy uses a variety of open source and custom tools to monitor their infrastructure including Ganglia, StatsD, Graphite, Syslog-Ng, Logster, Splunk, Logstash, Eventinator, Chef, Nagios, and Nagios Herald. They collect system, application, and log metrics and events and store them for analysis. Their monitoring setup is fully configured through Chef to provide consistent deployments across environments with over 120 monitoring recipes.
21. @mrtazz
Splunk
• Indexes all of our log files
• Easy search for patterns
• Saved searches for interesting ones
• Basically using it as a glorified grep
22. @mrtazz
Logstash
• Experiment status
• Makes it easier integrate different sources
• Easy to set up in dev environment
• Trying to figure out where/how it fits into
our infrastructure
23. @mrtazz
Eventinator
• Tracks all events in our infrastructure
• Chef runs and changes
• DNS changes
• Network
• Deploys
• Server provisioning and decommissioning
• ~ 12 million events in the last 2 years
25. @mrtazz
Chef
• rules everything around me
• Same cookbooks on prod and dev
• every node runs Chef every 10 minutes
• ton of knife plugins and handlers
30. @mrtazz
Nagios
• 2 instances in each DC/environment
• Fully Chef generated configuration
• Service checks and contacts in git
• Notifications via email->SMS gateway
• ~75% ops on-call
35. @mrtazz
Nagios Herald
• Add context to nagios alerts
• What are the first 5 things you do when
you get paged?
• You already have the phone in your hand
• nagios notification handler
41. @mrtazz
Summary
• Set of trusted tools
• Enhance where they come short
• Try out new things
• Write tools where applicable
• Continuous monitoring and adaptation