Your SlideShare is downloading. ×
  • Like
Metrics driven engineering (velocity 2011)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Metrics driven engineering (velocity 2011)

  • 2,107 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,107
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
34
Comments
0
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. METRICS-DRIVEN ENGINEERING at Kellan Elliott-McCrea, VP of Eng. kellan@etsy.com @kellanTuesday, June 5, 12
  • 2. Tuesday, June 5, 12
  • 3. Tuesday, June 5, 12
  • 4. What is Etsy?Tuesday, June 5, 12
  • 5. 8.5+ million items in the marketplaceTuesday, June 5, 12
  • 6. 400,000+ activeTuesday, June 5, 12
  • 7. $300+ million in sales in 2010 ~$41 million/monthTuesday, June 5, 12
  • 8. > $1000 / minuteTuesday, June 5, 12
  • 9. > 1 billion page views / monthTuesday, June 5, 12
  • 10. business in over 150 countriesTuesday, June 5, 12
  • 11. deploy the site, every ~20 minutesTuesday, June 5, 12
  • 12. engineering team grew ~4x in 2010Tuesday, June 5, 12
  • 13. Metrics?Tuesday, June 5, 12
  • 14. Logs, Graphs, Trends, and CorrelationsTuesday, June 5, 12
  • 15. Metrics Driven?Tuesday, June 5, 12
  • 16. Making DecisionsTuesday, June 5, 12
  • 17. How many visitors are using this thing?Tuesday, June 5, 12
  • 18. Can we deploy that to 100% of our visitors?Tuesday, June 5, 12
  • 19. Did we make it faster?Tuesday, June 5, 12
  • 20. Did I just break something?Tuesday, June 5, 12
  • 21. Q. WHO MAKES THESE GRAPHS? A. Well,racksOps team manages thethe network, the the servers, installed monitoring tools, wears the pagers, blah, blah, blah...Tuesday, June 5, 12
  • 22. but... Engineers build the application.Tuesday, June 5, 12
  • 23. Dev + OpsTuesday, June 5, 12
  • 24. ACCESSTuesday, June 5, 12
  • 25. Yes! No.Tuesday, June 5, 12
  • 26. “Engineers are too busy!”Tuesday, June 5, 12
  • 27. Here’s the BIG SECRET...Tuesday, June 5, 12
  • 28. ... MAKE IT EASY!Tuesday, June 5, 12
  • 29. Simple, open source toolsTuesday, June 5, 12
  • 30. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting)Tuesday, June 5, 12
  • 31. Gan ★cluster oriented ★huge community contributed recipes ★2.0 released today (including several Flickr and Etsy patches!) ★gmetad makes it easy to track custom metricsTuesday, June 5, 12
  • 32. Tuesday, June 5, 12
  • 33. Graphite ★super flexible collection and display ★per metrics buckets ★single instance ★super easy to write and use custom display functionsTuesday, June 5, 12
  • 34. LoggingTuesday, June 5, 12
  • 35. Logger::log_error("User login failed. Reason: $msg for $username", “login”);Tuesday, June 5, 12
  • 36. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
  • 37. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
  • 38. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
  • 39. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
  • 40. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
  • 41. web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
  • 42. Counting and Timing http://code.flickr.com/blog/ 2008/10/27/counting-timing/Tuesday, June 5, 12
  • 43. LogsterTuesday, June 5, 12
  • 44. Logster https://github.com/etsy/logsterTuesday, June 5, 12
  • 45. Forked from ganglia-logtailer : - Daemon mode (only cron mode) + Support for Graphite + Simplified parsing scriptsTuesday, June 5, 12
  • 46. web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is fallingTuesday, June 5, 12
  • 47. Fatals Errors WarningsTuesday, June 5, 12
  • 48. ★runs out of cron ★maintains a cursor into log files ★supports ganglia and graphite ★custom parsers much easier to write then gmetadTuesday, June 5, 12
  • 49. Apache access logsTuesday, June 5, 12
  • 50. LogFormat "%h %l %u %t "%r" %>s %b" commonTuesday, June 5, 12
  • 51. LogFormat "%{X-Forwarded-For}i % {True-Client-IP}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" % {etsy_shop_id}n %{etsy_uaid}n %V % {etsy_ab_selections}n % {etsy_request_uuid}n % {etsy_api_consumer_key}n % {etsy_api_method_name}n % {php_memory_usage_bytes}n % {php_time_microsec}n %D" combinedTuesday, June 5, 12
  • 52. %{etsy_ab_selections}nTuesday, June 5, 12
  • 53. %{etsy_uaid}nTuesday, June 5, 12
  • 54. GraphsTuesday, June 5, 12
  • 55. “If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we track it.” - Erik Kastner http://codeascraft.etsy.com/2011/02/15/measure- anything-measure-everything/Tuesday, June 5, 12
  • 56. Tuesday, June 5, 12
  • 57. StatsDTuesday, June 5, 12
  • 58. StatsD https://github.com/ etsy/statsd/Tuesday, June 5, 12
  • 59. StatsD::increment("logins.success"); StatsD::timing("gearman.time", $msec);Tuesday, June 5, 12
  • 60. 90th pct average lower StatsD::timing("gearman.time", $msec);Tuesday, June 5, 12
  • 61. Ad hoc name value timestampTuesday, June 5, 12
  • 62. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003Tuesday, June 5, 12
  • 63. CorrelationsTuesday, June 5, 12
  • 64. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003Tuesday, June 5, 12
  • 65. Trends + Events target=drawAsInfinite(events.deploy.site)Tuesday, June 5, 12
  • 66. What Happened?Tuesday, June 5, 12
  • 67. Holt-WintersTuesday, June 5, 12
  • 68. "Forecasting Sales by Exponentially Weighted Moving Averages". PeterTuesday, June 5, 12
  • 69. "Aberrant Behavior Detection in Time Series for Network Monitoring".Tuesday, June 5, 12
  • 70. "Holt-Winters Forecasting Applied to Poisson Processes in Real-Time".Tuesday, June 5, 12
  • 71. holtWintersConfidence(Upper|Lower)Tuesday, June 5, 12
  • 72. holtWintersAberrationTuesday, June 5, 12
  • 73. business metrics with confidence bands == alertable business metricsTuesday, June 5, 12
  • 74. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA)Tuesday, June 5, 12
  • 75. 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA)Tuesday, June 5, 12
  • 76. DashboardsTuesday, June 5, 12
  • 77. DashboardsTuesday, June 5, 12
  • 78. DashboardsTuesday, June 5, 12
  • 79. Hard <a href="http://graphite.etsycorp.com/render? from=-1hours&width=800&height=600&title=File+or+Script+Not +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a>Tuesday, June 5, 12
  • 80. Easy! $g = new Graphite($time); $g->setTitle(File Not Found); $g->addMetric(webs.errorLog.notExist, #00cc00); $g->showDeploys(true); echo $g->getDashboardHTML(280, 220);Tuesday, June 5, 12
  • 81. 48 dashboards by 32 engineersTuesday, June 5, 12
  • 82. Application healthTuesday, June 5, 12
  • 83. High-level visibilityTuesday, June 5, 12
  • 84. Low MTTDTuesday, June 5, 12
  • 85. ConfidenceTuesday, June 5, 12
  • 86. Make metricsTuesday, June 5, 12
  • 87. Make metricsTuesday, June 5, 12
  • 88. Make metricsTuesday, June 5, 12
  • 89. Not that muchTuesday, June 5, 12
  • 90. codeascraft.etsy.com github.com/etsy/statsd github.com/etsy/logster bitbucket.org/maplebed/ganglia- logtailerTuesday, June 5, 12
  • 91. Questions?Tuesday, June 5, 12