• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Metrics-Driven Engineering at Etsy
 

Metrics-Driven Engineering at Etsy

on

  • 14,661 views

 

Statistics

Views

Total Views
14,661
Views on SlideShare
9,055
Embed Views
5,606

Actions

Likes
46
Downloads
211
Comments
0

14 Embeds 5,606

http://www.mikebrittain.com 5355
http://lanyrd.com 206
http://www.linkedin.com 11
https://twitter.com 10
http://pinterest.com 6
http://www.netvibes.com 5
http://webcache.googleusercontent.com 3
http://theoldreader.com 3
http://coderwall.com 2
http://131.253.14.250 1
https://si0.twimg.com 1
http://translate.googleusercontent.com 1
http://www.bing.com 1
http://www.newsblur.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Metrics-Driven Engineering at Etsy Metrics-Driven Engineering at Etsy Presentation Transcript

    • Metrics-drivenEngineering at Etsy MIKE BRITTAIN mike@etsy.com @mikebrittain
    • Logs, Graphs, Trends, and Correlations
    • Making Decisions
    • How many visitors are using this thing?
    • Can we deploy that to100% of our visitors?
    • Did we make it faster?
    • Did I just break something?
    • Q. Who makes the graphs?A. Well, the Ops team manages the network, racksthe servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
    • (but...) Engineers build the application.
    • Dev + Ops
    • Access
    • Yes No
    • “Engineers are too busy meeting our product deadlines.”
    • Here’s the big secret...
    • Cacti (network, SNMP)Ganglia (machines)Graphite (application)Splunk (log analysis, nightly reports)Nagios (alerting)
    • Logging
    • Logger::log_error("User login failed. Reason: $msg for $username", “login”);
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
    • Logster
    • Forked from ganglia-logtailer...- Daemon mode (only cron mode)+ Support for Graphite+ Simplified parsing scripts
    • web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
    • Fatals Errors Warnings
    • StatsD
    • StatsD::increment("logins.success");StatsD::timing("gearman.time", $msec);
    • 90th pct average lowerStatsD::timing("gearman.time", $msec);
    • Ad hocname value timestampn
    • echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
    • Trends + Eventstarget=drawAsInfinite(events.deploy.site)
    • What Happened?
    • 16,000 metrics in Graphite (plus 32,000 metrics in Ganglia)
    • Dashboards
    • Mix & MatchDashboards
    • Hard<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render?from=-1hours&width=280&height=220&title=File+or+Script+Not+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"></a>
    • Easy$g = new Graphite($time);$g->setTitle(File Not Found);$g->addMetric(webs.errorLog.notExist, #00cc00);$g->showDeploys(true);echo $g->getDashboardHTML(280, 220);
    • 20 dashboards by 25 engineers
    • Application healthcorrelated with events
    • High-level visibility
    • Low MTTD
    • Validation
    • Confidence
    • codeascraft.etsy.comgithub.com/etsy/statsdgithub.com/etsy/logsterbitbucket.org/maplebed/ganglia-logtailer
    • Q&ADoes this sound like fun? Get in touch with us. chad@etsy.com kellan@etsy.com kastner@etsy.com mike@etsy.com