Your SlideShare is downloading. ×
0
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Whirlwind Tour of Etsy's Monitoring Stack

2,848

Published on

It's no secret that at Etsy we are big fans of small, incremental and frequent changes and tight feedback loops. This is how we make it possible to deploy changes to our main codebase more than 50 …

It's no secret that at Etsy we are big fans of small, incremental and frequent changes and tight feedback loops. This is how we make it possible to deploy changes to our main codebase more than 50 times a day and also safely apply changes to our infrastructure in a continuous fashion. It enables us to rapidly fix bugs and roll out features in our application stack and infrastructure. This however would not be possible without a tight feedback loop and a myriad of monitoring tools that keep us informed about changes and possible problems in every nook and cranny of the Etsy stack, no matter if it's a network change event, systems or application level performance or how bad the last week of on-call rotation was.

Published in: Technology, Design
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,848
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
30
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Whirlwind Tour of Etsy's Monitoring Stack Daniel Schauenberg dschauenberg@etsy.com @mrtazz
  • 2. @mrtazz
  • 3. @mrtazz
  • 4. @mrtazzItem by TheBackPackShoppe
  • 5. How comfortable are you deploying a change right now?
  • 6. “If this is your first day at Etsy, you deploy the site”
  • 7. @mrtazz Ganglia • System level metrics • Instance per DC/environment • > 220k RRD files • Fully configured through Chef role attributes
  • 8. @mrtazz Rainbow Graphs!
  • 9. @mrtazz StatsD • Single instance on one server • Traffic mostly from 70 Web & 24 API servers • Node.js • Heavy Sampling • Graphite as backend
  • 10. @mrtazz
  • 11. @mrtazz Graphite • Application level metrics • 96G RAM, 20 Cores, 7.3T SSD RAID 10 • 525k metrics/minute • Mirrored Master/Master Setup • Functionally sharded relays
  • 12. @mrtazz CNAME relays relays caches caches statsdtimers statsdcounts statsd chef logster fqld search generic
  • 13. @mrtazz
  • 14. @mrtazz
  • 15. @mrtazz Syslog-Ng • Web, Search, Gearman, Photos, Nagios, Network, VPN • 1.2GB written/minute • Chef role attribute based config • Rule ordering!
  • 16. @mrtazz github.com/etsy/logster • Extract metrics from log files • Written in Python • Runs every minute via cron
  • 17. @mrtazz Splunk • Indexes all of our log files • Easy search for patterns • Saved searches for interesting ones • Basically using it as a glorified grep
  • 18. @mrtazz Logstash • Experiment status • Makes it easier integrate different sources • Easy to set up in dev environment • Trying to figure out where/how it fits into our infrastructure
  • 19. @mrtazz Eventinator • Tracks all events in our infrastructure • Chef runs and changes • DNS changes • Network • Deploys • Server provisioning and decommissioning • ~ 12 million events in the last 2 years
  • 20. @mrtazz
  • 21. @mrtazz Chef • rules everything around me • Same cookbooks on prod and dev • every node runs Chef every 10 minutes • ton of knife plugins and handlers
  • 22. @mrtazz
  • 23. @mrtazz > 120 recipes
  • 24. @mrtazz
  • 25. @mrtazz Nagios
  • 26. @mrtazz Nagios • 2 instances in each DC/environment • Fully Chef generated configuration • Service checks and contacts in git • Notifications via email->SMS gateway • ~75% ops on-call
  • 27. @mrtazz github.com/lozzd/nagdash
  • 28. @mrtazz
  • 29. @mrtazz
  • 30. @mrtazz
  • 31. @mrtazz Nagios Herald • Add context to nagios alerts • What are the first 5 things you do when you get paged? • You already have the phone in your hand • nagios notification handler
  • 32. @mrtazz
  • 33. @mrtazz The Toys are real
  • 34. @mrtazz There’s another side of heaven
  • 35. @mrtazz Ops Weekly
  • 36. @mrtazz Ops Weekly
  • 37. @mrtazz Summary • Set of trusted tools • Enhance where they come short • Try out new things • Write tools where applicable • Continuous monitoring and adaptation
  • 38. @mrtazz codeascraft.com etsy.com/codeascraft/talks etsy.github.com etsy.com/careers
  • 39. @mrtazz Questions?
  • 40. A Whirlwind Tour of Etsy's Monitoring Stack Daniel Schauenberg dschauenberg@etsy.com @mrtazz

×