Your SlideShare is downloading. ×
0
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Metrics-Driven Engineering
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Metrics-Driven Engineering

18,221

Published on

Presented at Web 2.0 Expo, Oct. 13 2011

Presented at Web 2.0 Expo, Oct. 13 2011

Published in: Technology
1 Comment
84 Likes
Statistics
Notes
  • super convincing slides
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
18,221
On Slideshare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
379
Comments
1
Likes
84
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Metrics-Driven EngineeringMike Brittain @ mikebrittainDirector of engineering, Infrastructure October 13, 2011
  • 2. Tools and Process at Etsy
  • 3. How many new visits? How many listings created? How many registrations?How do people use Etsy? How many convos sent? How many purchases? How many new shops?
  • 4. Search indexing? How fast are pages generating? Async tasks currently in queue?What is the application doing? Developer API auth and rate limiting? Images resized and stored? Error and warning rates?
  • 5. Replication slave lag? Memcache hits/misses? Available connections?Are the servers in good shape ? Database queries per second? Total outgoing bandwidth? CPU, Memory, I/O?
  • 6. Business Metrics
  • 7. Application Metrics
  • 8. System Metrics
  • 9. Visibility EVERYWHERE
  • 10. Constant Change
  • 11. $314 Million GMS 2010 $180 Million GMS 2009 $87 Million GMS 2008 $26 Million GMS 2007credit: pentarux (flickr)
  • 12. 25 Million Unique Visitors 1 Billion page views per monthcredit: pentarux (flickr)
  • 13. Engineering team grew 500% over 18 monthscredit: martin_heigan (flickr)
  • 14. Less talk, more do.
  • 15. Always Be Shippingcredit: ibailemon (flickr)
  • 16. Always Be Shipping (even if it’s your first day)credit: ibailemon (flickr)
  • 17. 90+ Engineers 40+ Deploys / daycredit: misswired (flickr)
  • 18. credit: digidave (flickr)
  • 19. Code Reviews
  • 20. Automated Tests
  • 21. $cfg = array( checkout => array(enabled => on), homepage => array(enabled => on), profiles => array(enabled => on), new_search => array(enabled => off),); Config FlagsEnable and disable features quickly
  • 22. $cfg = array( checkout => array(enabled => on), homepage => array(enabled => on), profiles => array(enabled => on), new_search => array(enabled => off),); Config FlagsEnable and disable features quicklyPlus “admin-only,” percentage ramp-up, A/B testing,whitelists, blacklists, etc...
  • 23. Failure is not an option
  • 24. inevitable!Failure is not an option
  • 25. inevitable!Failure is not an option a learning opportunity!
  • 26. inevitable!Failure is not an option a learning opportunity! DETECTABLE!
  • 27. Access
  • 28. Detect problems quickly
  • 29. CONFIDENCE
  • 30. A: Well, the Ops team manages the network, racks the servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
  • 31. Engineers build the application
  • 32. Logging GraphingOPS ENG Trending Alerting
  • 33. “Engineers are too busy writing features to build metrics.”
  • 34. Metrics are part of every feature ...and so are config flags
  • 35. Dead Simple
  • 36. Simple, open source tools
  • 37. Cacti (network, SNMP)Ganglia (machines)Graphite (application)Splunk (log analysis, nightly reports)Nagios (alerting) Logging Logster StatsD
  • 38. Ganglia
  • 39. GangliaCluster-orientedHuge community contributed recipesCustom metrics (gmetad)
  • 40. Graphite
  • 41. Graphite Single-instance Create new metrics on-the-fly Customize via URLs and display functions
  • 42. Logging
  • 43. It’s 2:48 PM.Do you know where your logs are?
  • 44. Logger::log_error("User login failed.Reason: $msg for $username", “login”);
  • 45. Logger::log_error("User login failed.Reason: $msg for $username", “login”);
  • 46. web0054 [Fri Mar 04 16:27:48 2011][error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 47. web0054 [Fri Mar 04 16:27:48 2011][error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 48. web0054 [Fri Mar 04 16:27:48 2011][error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 49. web0054 [Fri Mar 04 16:27:48 2011][error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 50. web0054 [Fri Mar 04 16:27:48 2011][error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 51. web0054 [Fri Mar 04 16:27:48 2011][error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 52. LogFormat "%h %l %u %t "%r" %>s %b" common
  • 53. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 54. apache_note()
  • 55. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 56. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 57. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 58. grep "/listing/" access.log | awk {sum=sum+$(NF-2)} END {print sum/NR}
  • 59. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda.web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling.web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling.web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling.web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo!web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh!web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh noooooooooooweb0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!!web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue.web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
  • 60. LogsterFatals Errors Warnings
  • 61. LogsterRun by cronKeeps a cursor on your log fileAggregate lines anyway you wantOutput to Ganglia or GraphiteSimple parsers github.com/etsy
  • 62. web0054 [Fri Mar 04 16:27:48 2011][error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 63. ^.+ [.+] [(?P<log_level>.+)]
  • 64. if (fields[log_level] == “fatal”): self.fatals += 1elif (fields[log_level] == “error”): self.errors += 1elif (fields[log_level] == “warning”): self.warnings += 1...
  • 65. MetricObject("fatals", (self.fatals / self.duration), "per sec")MetricObject("errors", (self.errors / self.duration), "per sec")MetricObject("warning", (self.warnings / self.duration), "per sec")
  • 66. Fatals Errors Warnings
  • 67. StatsD
  • 68. StatsD Network daemon (node.js) Accepts data over UDP Flushes to Graphite every 10 sec One-line of codegithub.com/etsy
  • 69. StatsD::increment("logins.success");
  • 70. StatsD::increment("logins.success"); logins
  • 71. StatsD::timing("gearman.time", $msec);
  • 72. StatsD::timing("gearman.time", $msec); 90th pct average lower
  • 73. Ad hocname value timestamp
  • 74. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  • 75. Vertical Line Technology!target=drawAsInfinite(events.deploy.site)
  • 76. We could stare at graphs all day...
  • 77. http://graphite/render? from=-1hours&width=600&height=200&target=webs.errorLog.warning&rawData=1
  • 78. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1webs.errorLog.warning,1318444930,1318448530,60|5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0,1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0,1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5.0,1.0,1.0,None
  • 79. Holt-Winters Confidence Bandsupper lower
  • 80. Holt-Winters Aberration
  • 81. Business metrics + Confidence bands_____________ Alertable metrics
  • 82. 40,000+ metrics at Etsy Systems, Applications, Business
  • 83. Dashboards
  • 84. Dashboards
  • 85. Kind of Hard :-/<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render?from=-1hours&width=280&height=220&title=File+or+Script+Not+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"></a>
  • 86. Super Easy!$g = new Graphite($time);$g->setTitle(File Not Found);$g->addMetric(webs.errorLog.notExist, #00cc00);echo $g->getDashboardHTML(280, 220);
  • 87. Metrics!
  • 88. Metrics!Metrics + Events
  • 89. Metrics!Metrics + EventsMetrics + Alerts
  • 90. Metrics!Metrics + EventsMetrics + AlertsMetrics + Metrics
  • 91. High-level, real-time visibility
  • 92. Detect problems quickly
  • 93. CONFIDENCE
  • 94. Make them required features
  • 95. Make them dead simple
  • 96. Make them accessible
  • 97. Make them!
  • 98. Homeworkcodeascraft.etsy.comgithub.com/etsy Get in touch mike @ etsy . comWe’re always looking for people @ mikebrittainwho are interested in this kind ofstuff...Thank Youetsy.com/careers

×