Your SlideShare is downloading. ×
0
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Metrics-Driven Engineering at Etsy
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Metrics-Driven Engineering at Etsy

16,094

Published on

Published in: Technology
0 Comments
50 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
16,094
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
237
Comments
0
Likes
50
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Metrics-drivenEngineering at Etsy MIKE BRITTAIN mike@etsy.com @mikebrittain
  • 2. Logs, Graphs, Trends, and Correlations
  • 3. Making Decisions
  • 4. How many visitors are using this thing?
  • 5. Can we deploy that to100% of our visitors?
  • 6. Did we make it faster?
  • 7. Did I just break something?
  • 8. Q. Who makes the graphs?A. Well, the Ops team manages the network, racksthe servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
  • 9. (but...) Engineers build the application.
  • 10. Dev + Ops
  • 11. Access
  • 12. Yes No
  • 13. “Engineers are too busy meeting our product deadlines.”
  • 14. Here’s the big secret...
  • 15. Cacti (network, SNMP)Ganglia (machines)Graphite (application)Splunk (log analysis, nightly reports)Nagios (alerting)
  • 16. Logging
  • 17. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 18. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  • 19. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  • 20. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  • 21. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  • 22. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] User login failed. Reason: wrong password for ...
  • 23. Logster
  • 24. Forked from ganglia-logtailer...- Daemon mode (only cron mode)+ Support for Graphite+ Simplified parsing scripts
  • 25. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
  • 26. Fatals Errors Warnings
  • 27. StatsD
  • 28. StatsD::increment("logins.success");StatsD::timing("gearman.time", $msec);
  • 29. 90th pct average lowerStatsD::timing("gearman.time", $msec);
  • 30. Ad hocname value timestampn
  • 31. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  • 32. Trends + Eventstarget=drawAsInfinite(events.deploy.site)
  • 33. What Happened?
  • 34. 16,000 metrics in Graphite (plus 32,000 metrics in Ganglia)
  • 35. Dashboards
  • 36. Mix & MatchDashboards
  • 37. Hard<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render?from=-1hours&width=280&height=220&title=File+or+Script+Not+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"></a>
  • 38. Easy$g = new Graphite($time);$g->setTitle(File Not Found);$g->addMetric(webs.errorLog.notExist, #00cc00);$g->showDeploys(true);echo $g->getDashboardHTML(280, 220);
  • 39. 20 dashboards by 25 engineers
  • 40. Application healthcorrelated with events
  • 41. High-level visibility
  • 42. Low MTTD
  • 43. Validation
  • 44. Confidence
  • 45. codeascraft.etsy.comgithub.com/etsy/statsdgithub.com/etsy/logsterbitbucket.org/maplebed/ganglia-logtailer
  • 46. Q&ADoes this sound like fun? Get in touch with us. chad@etsy.com kellan@etsy.com kastner@etsy.com mike@etsy.com

×