SlideShare a Scribd company logo
1 of 44
Download to read offline
A Whirlwind Tour of
Etsy's Monitoring Stack
Daniel Schauenberg
dschauenberg@etsy.com
@mrtazz
@mrtazz
@mrtazz
@mrtazzItem by TheBackPackShoppe
How comfortable
are you deploying
a change right
now?
“If this is your first
day at Etsy, you
deploy the site”
@mrtazz
Ganglia
• System level metrics
• Instance per DC/environment
• > 220k RRD files
• Fully configured through Chef role
attributes
@mrtazz
Rainbow Graphs!
@mrtazz
StatsD
• Single instance on one server
• Traffic mostly from 70 Web & 24 API
servers
• Node.js
• Heavy Sampling
• Graphite as backend
@mrtazz
@mrtazz
Graphite
• Application level metrics
• 96G RAM, 20 Cores, 7.3T SSD RAID 10
• 525k metrics/minute
• Mirrored Master/Master Setup
• Functionally sharded relays
@mrtazz
CNAME
relays
relays
caches
caches
statsdtimers	

statsdcounts	

statsd	

chef	

logster	

fqld	

search	

generic
@mrtazz
@mrtazz
@mrtazz
Syslog-Ng
• Web, Search, Gearman, Photos, Nagios,
Network, VPN
• 1.2GB written/minute
• Chef role attribute based config
• Rule ordering!
@mrtazz
github.com/etsy/logster
• Extract metrics from log files
• Written in Python
• Runs every minute via cron
@mrtazz
Splunk
• Indexes all of our log files
• Easy search for patterns
• Saved searches for interesting ones
• Basically using it as a glorified grep
@mrtazz
Logstash
• Experiment status
• Makes it easier integrate different sources
• Easy to set up in dev environment
• Trying to figure out where/how it fits into
our infrastructure
@mrtazz
Eventinator
• Tracks all events in our infrastructure
• Chef runs and changes
• DNS changes
• Network
• Deploys
• Server provisioning and decommissioning
• ~ 12 million events in the last 2 years
@mrtazz
@mrtazz
Chef
• rules everything around me
• Same cookbooks on prod and dev
• every node runs Chef every 10 minutes
• ton of knife plugins and handlers
@mrtazz
@mrtazz
> 120 recipes
@mrtazz
@mrtazz
Nagios
@mrtazz
Nagios
• 2 instances in each DC/environment
• Fully Chef generated configuration
• Service checks and contacts in git
• Notifications via email->SMS gateway
• ~75% ops on-call
@mrtazz
github.com/lozzd/nagdash
@mrtazz
@mrtazz
@mrtazz
@mrtazz
Nagios Herald
• Add context to nagios alerts
• What are the first 5 things you do when
you get paged?
• You already have the phone in your hand
• nagios notification handler
@mrtazz
@mrtazz
The Toys are real
@mrtazz
There’s another
side of heaven
@mrtazz
Ops Weekly
@mrtazz
Ops Weekly
@mrtazz
Summary
• Set of trusted tools
• Enhance where they come short
• Try out new things
• Write tools where applicable
• Continuous monitoring and adaptation
@mrtazz
codeascraft.com	

etsy.com/codeascraft/talks	

etsy.github.com	

etsy.com/careers
@mrtazz
Questions?
A Whirlwind Tour of
Etsy's Monitoring Stack
Daniel Schauenberg
dschauenberg@etsy.com
@mrtazz

More Related Content

What's hot

Configuring SQL Server Reporting Services for ASP.NET Running on Azure Web Role
Configuring SQL Server Reporting Services for ASP.NET Running on Azure Web RoleConfiguring SQL Server Reporting Services for ASP.NET Running on Azure Web Role
Configuring SQL Server Reporting Services for ASP.NET Running on Azure Web Role
Allan Mangune
 
2011.07.14 LT Doc fluxflex on JAWS-UG
2011.07.14 LT Doc fluxflex on JAWS-UG2011.07.14 LT Doc fluxflex on JAWS-UG
2011.07.14 LT Doc fluxflex on JAWS-UG
Hiro Fukami
 

What's hot (20)

Maintenance Mode: Leveraging Chef to Schedule Patching, Reboot, etc.
Maintenance Mode: Leveraging Chef to Schedule Patching, Reboot, etc.Maintenance Mode: Leveraging Chef to Schedule Patching, Reboot, etc.
Maintenance Mode: Leveraging Chef to Schedule Patching, Reboot, etc.
 
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, InstructPuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
PuppetConf 2017: No Server Left Behind - Miguel Di Ciurcio Filho, Instruct
 
Serverless preview environments to the rescue
Serverless preview environments to the rescueServerless preview environments to the rescue
Serverless preview environments to the rescue
 
PuppetConf 2017: Using Puppet Enterprise APIs with PowerShell- Jason Johnson,...
PuppetConf 2017: Using Puppet Enterprise APIs with PowerShell- Jason Johnson,...PuppetConf 2017: Using Puppet Enterprise APIs with PowerShell- Jason Johnson,...
PuppetConf 2017: Using Puppet Enterprise APIs with PowerShell- Jason Johnson,...
 
Configuring SQL Server Reporting Services for ASP.NET Running on Azure Web Role
Configuring SQL Server Reporting Services for ASP.NET Running on Azure Web RoleConfiguring SQL Server Reporting Services for ASP.NET Running on Azure Web Role
Configuring SQL Server Reporting Services for ASP.NET Running on Azure Web Role
 
Stripe con 2021 UI stack
Stripe con 2021 UI stackStripe con 2021 UI stack
Stripe con 2021 UI stack
 
Troubleshooting & debugging production microservices in Kubernetes with Googl...
Troubleshooting & debugging production microservices in Kubernetes with Googl...Troubleshooting & debugging production microservices in Kubernetes with Googl...
Troubleshooting & debugging production microservices in Kubernetes with Googl...
 
React for .net developers
React for .net developersReact for .net developers
React for .net developers
 
From No Git to 3000 GitHub Users and How to Keep Them Happy - GitHub Universe...
From No Git to 3000 GitHub Users and How to Keep Them Happy - GitHub Universe...From No Git to 3000 GitHub Users and How to Keep Them Happy - GitHub Universe...
From No Git to 3000 GitHub Users and How to Keep Them Happy - GitHub Universe...
 
Why you should add React to your Rails application now!
Why you should add React to your Rails application now!Why you should add React to your Rails application now!
Why you should add React to your Rails application now!
 
Expressjs from-zero-to-hero
Expressjs from-zero-to-heroExpressjs from-zero-to-hero
Expressjs from-zero-to-hero
 
It Sounded Good on Paper - Lessons Learned with Puppet
It Sounded Good on Paper - Lessons Learned with PuppetIt Sounded Good on Paper - Lessons Learned with Puppet
It Sounded Good on Paper - Lessons Learned with Puppet
 
Sample From Ramesh
Sample From RameshSample From Ramesh
Sample From Ramesh
 
The Ruby workflow
The Ruby workflowThe Ruby workflow
The Ruby workflow
 
React, London JS Meetup, 11 Aug 2015
React, London JS Meetup, 11 Aug 2015React, London JS Meetup, 11 Aug 2015
React, London JS Meetup, 11 Aug 2015
 
Chef vs. Puppet in the Cloud: How Telepictures and MoneySuperMarket Do It
Chef vs. Puppet in the Cloud: How Telepictures and MoneySuperMarket Do ItChef vs. Puppet in the Cloud: How Telepictures and MoneySuperMarket Do It
Chef vs. Puppet in the Cloud: How Telepictures and MoneySuperMarket Do It
 
Build Nodejs APIs using Serverless
Build Nodejs APIs  using Serverless Build Nodejs APIs  using Serverless
Build Nodejs APIs using Serverless
 
JUST EAT: Tools we use to enable our culture
JUST EAT: Tools we use to enable our cultureJUST EAT: Tools we use to enable our culture
JUST EAT: Tools we use to enable our culture
 
2011.07.14 LT Doc fluxflex on JAWS-UG
2011.07.14 LT Doc fluxflex on JAWS-UG2011.07.14 LT Doc fluxflex on JAWS-UG
2011.07.14 LT Doc fluxflex on JAWS-UG
 
IoT Google Cloud Functions with Firebase
IoT Google Cloud Functions with FirebaseIoT Google Cloud Functions with Firebase
IoT Google Cloud Functions with Firebase
 

Viewers also liked

Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at EtsyMetrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Mike Brittain
 
Unit 7 lesson d old friends
Unit 7 lesson d old friendsUnit 7 lesson d old friends
Unit 7 lesson d old friends
pilarquiroga
 
Comunicación inetractiva
Comunicación inetractivaComunicación inetractiva
Comunicación inetractiva
kmtg
 
Gps buddy eğitim dokümanı
Gps buddy eğitim dokümanıGps buddy eğitim dokümanı
Gps buddy eğitim dokümanı
Serkan Ardahanli
 
Mobeego Presentation
Mobeego PresentationMobeego Presentation
Mobeego Presentation
JP Botha
 

Viewers also liked (20)

Front end performance on Shopify.com
Front end performance on Shopify.comFront end performance on Shopify.com
Front end performance on Shopify.com
 
Scaling Deployment at Etsy
Scaling Deployment at EtsyScaling Deployment at Etsy
Scaling Deployment at Etsy
 
Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012Architecting for Change: QCONNYC 2012
Architecting for Change: QCONNYC 2012
 
Data Driven Monitoring
Data Driven MonitoringData Driven Monitoring
Data Driven Monitoring
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Metrics-Driven Engineering
Metrics-Driven EngineeringMetrics-Driven Engineering
Metrics-Driven Engineering
 
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at EtsyMetrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
 
Enterprise Software Architecture styles
Enterprise Software Architecture stylesEnterprise Software Architecture styles
Enterprise Software Architecture styles
 
Tercera uni copia
Tercera uni   copiaTercera uni   copia
Tercera uni copia
 
2.b. rse
2.b. rse2.b. rse
2.b. rse
 
Mri taller #2
Mri taller #2Mri taller #2
Mri taller #2
 
Unit 7 lesson d old friends
Unit 7 lesson d old friendsUnit 7 lesson d old friends
Unit 7 lesson d old friends
 
Corporate Pádel League
Corporate Pádel LeagueCorporate Pádel League
Corporate Pádel League
 
Using the Second Screen (at IFA Medienwoche 2011)
Using the Second Screen (at IFA Medienwoche 2011)Using the Second Screen (at IFA Medienwoche 2011)
Using the Second Screen (at IFA Medienwoche 2011)
 
ES.next
ES.nextES.next
ES.next
 
Comunicación inetractiva
Comunicación inetractivaComunicación inetractiva
Comunicación inetractiva
 
Dr. Abhishek K. Malakar_CV_EP15
Dr. Abhishek K. Malakar_CV_EP15Dr. Abhishek K. Malakar_CV_EP15
Dr. Abhishek K. Malakar_CV_EP15
 
exposicion
exposicionexposicion
exposicion
 
Gps buddy eğitim dokümanı
Gps buddy eğitim dokümanıGps buddy eğitim dokümanı
Gps buddy eğitim dokümanı
 
Mobeego Presentation
Mobeego PresentationMobeego Presentation
Mobeego Presentation
 

Similar to A Whirlwind Tour of Etsy's Monitoring Stack

London devops logging
London devops loggingLondon devops logging
London devops logging
Tomas Doran
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
Nathan Bijnens
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 

Similar to A Whirlwind Tour of Etsy's Monitoring Stack (20)

Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
stackconf 2022: Infrastructure Automation (anti) patterns
stackconf 2022: Infrastructure Automation (anti) patternsstackconf 2022: Infrastructure Automation (anti) patterns
stackconf 2022: Infrastructure Automation (anti) patterns
 
Infrastructure as Code Patterns
Infrastructure as Code PatternsInfrastructure as Code Patterns
Infrastructure as Code Patterns
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015
Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015
Top .NET, Java & Web Performance Mistakes - Meetup Jan 2015
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PAS...
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PAS...Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PAS...
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PAS...
 
MySQL Performance Monitoring
MySQL Performance MonitoringMySQL Performance Monitoring
MySQL Performance Monitoring
 
SCALE 10x Build a Cloud Day
SCALE 10x Build a Cloud DaySCALE 10x Build a Cloud Day
SCALE 10x Build a Cloud Day
 
Rubyslava + PyVo #48
Rubyslava + PyVo #48Rubyslava + PyVo #48
Rubyslava + PyVo #48
 
Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0Hadoop Migration from 0.20.2 to 2.0
Hadoop Migration from 0.20.2 to 2.0
 

More from Daniel Schauenberg

More from Daniel Schauenberg (8)

Human Factors and PostMortems
Human Factors and PostMortemsHuman Factors and PostMortems
Human Factors and PostMortems
 
Deploy, Collaborate and Listen
Deploy, Collaborate and ListenDeploy, Collaborate and Listen
Deploy, Collaborate and Listen
 
Development, Deployment and Collaboration at Etsy
Development, Deployment and Collaboration at EtsyDevelopment, Deployment and Collaboration at Etsy
Development, Deployment and Collaboration at Etsy
 
Feature Flagging your Infrastructure for Fun and Profit
Feature Flagging your Infrastructure for Fun and ProfitFeature Flagging your Infrastructure for Fun and Profit
Feature Flagging your Infrastructure for Fun and Profit
 
DevTools at Etsy
DevTools at EtsyDevTools at Etsy
DevTools at Etsy
 
Scaling Deployment at Etsy
Scaling Deployment at EtsyScaling Deployment at Etsy
Scaling Deployment at Etsy
 
StatsD Workshop Monitorama 2013
StatsD Workshop Monitorama 2013StatsD Workshop Monitorama 2013
StatsD Workshop Monitorama 2013
 
Etsy chef-workflow
Etsy chef-workflowEtsy chef-workflow
Etsy chef-workflow
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

A Whirlwind Tour of Etsy's Monitoring Stack