Metrics Driven Engineering
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Metrics Driven Engineering

  • 30,462 views
Uploaded on

Presented at USI 2013 in Paris, France. ...

Presented at USI 2013 in Paris, France.

In this talk I discuss how Etsy's engineering team collects and uses real-time metrics to add confidence to our Continuous Deployment culture.

Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.

http://www.etsy.com/careers

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
30,462
On Slideshare
30,404
From Embeds
58
Number of Embeds
9

Actions

Shares
Downloads
65
Comments
0
Likes
25

Embeds 58

https://twitter.com 26
http://www.google.com 8
http://translate.googleusercontent.com 8
http://reeewind.tistory.com 6
http://insford-test.tistory.com 5
http://besoyehazadi.wordpress.com 2
http://orw.inceptum.eu 1
http://staging8.inceptum.eu 1
http://pulse.me&_=1373261178439 HTTP 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Metrics-DrivenEngineeringMike BrittainENGINEERING DIRECTOR, ETSY@mikebrittain
  • 2. PROCESSANDTOOLSSupportingacultureofContinuousDeployment
  • 3. Howmanynewvisitors?Howmanylistingscreated?Howmanyregistrations?HowdopeopleuseEtsy?Howmanymessagessent?Howmanypurchases?Howmanynewshops?
  • 4. Searchindexing?Howfastarepagesgenerating?Asynctaskscurrentlyinqueue?Howistheapplicationbehaving?DeveloperAPIauthandratelimiting?Imagesresizedandstored?Errorandwarningrates?
  • 5. Replicationslavelag?Memcachehits/misses?Availableconnections?AretheserversandnetworkOK?Databasequeriespersecond?Totaloutgoingbandwidth?CPU,Memory,I/O?
  • 6. BusinessMetrics
  • 7. ApplicationMetrics
  • 8. SystemMetrics
  • 9. SystemMetrics
  • 10. VisibilityEVERYWHERE
  • 11. Metricshelpyouidentifygoals
  • 12. Metricshelpyouidentifygoals...butalsotellyouwhenyou’vebrokensomething.
  • 13. AlwaysBeShippingcredit: ibailemon (flickr)
  • 14. 1st dayPutyourselfonthewebsite.
  • 15. 2nd dayCompletetax,insurance,andbenefitsforms.credit: ktpupp (flickr)
  • 16. DevSandbox Trunk/master ProductionYou!Test
  • 17. 7e9a814 -> 63a2bb3Deploy to Production
  • 18. 50+Deploys/day200+Committers15Productteams8Infrastructureteams50+Deploys/daycredit: misswired (flickr)
  • 19. credit: digidave (flickr)
  • 20. PeerReviewCodereviews,Architecturereviews,Operabilityreviews
  • 21. AutomatedTestsStaticanalysis,Unittests,Integrationtests,Functionaltests
  • 22. May2013$102.9Millioningoodsold1.37Billionpageviewshttps://www.etsy.com/blog/news/2013/etsy-statistics-may-2013-weather-report/
  • 23. Failureisnotanoption
  • 24. Failureisnotanoptioninevitable
  • 25. Failureisnotanoptionanddetectable!inevitable
  • 26. Access
  • 27. Soundslikealotofwork,who’sgoingtobuildallofthis?Q:
  • 28. Well,theOpsteammanagesthenetwork,rackstheservers,installedthemonitoringtools,wearsthepagers,blah,blah,blah...A:Soundslikealotofwork,who’sgoingtobuildallofthis?Q:
  • 29. Engineersbuildtheapplication
  • 30. OPSLoggingGraphingTrendingAlertingENG
  • 31. Metricsarepartofeveryfeature(andsoareconfigflags)
  • 32. MakeitDEADSIMPLE
  • 33. Ganglia (application,servers,network)Logster* (application,servers)Cacti (network,SNMP)FITB* (network)*github.com/etsySimple,open-sourcetoolsGraphite (application)Statsd* (application)Logformats (application,servers)Nagios (alerting)
  • 34. Ganglia
  • 35. Cluster-orientedHugecommunitycontributedrecipesCustommetrics(gmetad)Ganglia
  • 36. Graphite
  • 37. Single-instanceCreatenewmetricson-the-flyCustomizeviaURLsanddisplayfunctionshttp://www.aosabook.org/en/graphite.htmlGraphite
  • 38. LogFormats
  • 39. Time, remote address, http method, request uri, referrer, user-agent,response size, response code, execution time, memory consumed,plus custom fields...• Signed-in/out (user_id vs. “-”)• display mode (“desktop” vs. “mobile”)• i10n/i18n (“en-US”)• etc.AccessLogs
  • 40. LogFormat %l %t "%r" %>s %b "%{Referer}i""%{User-Agent}i" %{custom_field}n ...apache_note(“custom_field”, $whatever);
  • 41. LogFormat "%{True-Client-IP}i %l %u %t "%r" %>s %b"%{Referer}i" "%{User-Agent}i"%{display_mode}n %{user_id}n %{php_bytes}n %{php_usec}n %D”web0060 66.249.71.110 - - [11/May/2011:17:08:53 +0000] "GET /listing/12189259/tropical-etched-pair-of-lampwork-glass HTTP/1.1"200 11034 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" desktop - 13399576 505780 554876
  • 42. LogFormat "%{True-Client-IP}i %l %u %t "%r" %>s %b"%{Referer}i" "%{User-Agent}i"%{display_mode}n %{user_id}n %{php_bytes}n %{php_usec}n %D”web0060 66.249.71.110 - - [11/May/2011:17:08:53 +0000] "GET /listing/12189259/tropical-etched-pair-of-lampwork-glass HTTP/1.1"200 11034 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" desktop - 13399576 505780 554876
  • 43. Logger::error("User login failed. Reason:$msg for $email_addr", “login”);Methodnamedenoteslog“level”—error,fatal,warning,notice,debug.A“namespace”parameterisprovidedsowecanaggregatelogentrieswithsimilarconcerns.
  • 44. Logger::error("User login failed. Reason:$msg for $email_addr", “login”);web0054 [Fri Mar 04 16:27:48 2011] [error] [login][mk04gw1p71] User login failed. Reason: wrongpassword was submitted for mike@etsy.comUniquerequestIDServernameDateandtime LevelNamespace
  • 45. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] Invalid charset conveweb0102 [Fri Mar 04 16:27:48 2011] [warning] [login] [47dd608551] User login failed.web0012 [Fri Mar 04 16:27:48 2011] [warning] [login] [mk04gw1p71] User login failed.web0081 [Fri Mar 04 16:27:48 2011] [error] [register] [39e08e6692] Duplicate user IDweb0100 [Fri Mar 04 16:27:49 2011] [fatal] [register] [f9c2b23702] Invalid charset coweb0003 [Fri Mar 04 16:27:49 2011] [error] [register] [39e08e6692] Duplicate user IDweb0050 [Fri Mar 04 16:27:49 2011] [error] [register] [2e468a9bb6] Duplicate user IDweb0054 [Fri Mar 04 16:27:49 2011] [warning] [login] [mk04gw1p71] User login failed.web0200 [Fri Mar 04 16:27:49 2011] [error] [login] [f9c2b23702] User login failed. Reweb0064 [Fri Mar 04 16:27:49 2011] [error] [login] [47dd608551] Duplicate user ID encweb0012 [Fri Mar 04 16:27:49 2011] [warning] [login] [32976da59c] User login failed.web0041 [Fri Mar 04 16:27:49 2011] [fatal] [login] [mk04gw1p71] Invalid charset conveweb0012 [Fri Mar 04 16:27:49 2011] [error] [login] [2f297b40a5] User login failed. Reweb0025 [Fri Mar 04 16:27:49 2011] [warning] [register] [32976da59c] User login faileweb0088 [Fri Mar 04 16:27:49 2011] [warning] [register] [2e468a9bb6] User login faileweb0050 [Fri Mar 04 16:27:50 2011] [warning] [register] [39e08e6692] User login faileweb0035 [Fri Mar 04 16:27:50 2011] [warning] [login] [2f297b40a5] User login failed.web0072 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [2f297b40a5] User login failedweb0050 [Fri Mar 04 16:27:50 2011] [error] [login] [2e468a9bb6] User login failed. Reweb0054 [Fri Mar 04 16:27:50 2011] [warning] [login] [mk04gw1p71] User login failed.web0200 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [f9c2b23702] User login failedweb0064 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [47dd608551] Invalid charset cweb0012 [Fri Mar 04 16:27:50 2011] [warning] [login] [32976da59c] User login failed.web0041 [Fri Mar 04 16:27:50 2011] [fatal] [login] [mk04gw1p71] Invalid charset conveweb0012 [Fri Mar 04 16:27:50 2011] [error] [register] [2f297b40a5] Duplicate user IDweb0025 [Fri Mar 04 16:27:50 2011] [warning] [login] [32976da59c] User login failed.web0088 [Fri Mar 04 16:27:50 2011] [warning] [login] [2e468a9bb6] User login failed.web0050 [Fri Mar 04 16:27:51 2011] [warning] [login] [39e08e6692] User login failed.web0035 [Fri Mar 04 16:27:51 2011] [warning] [login] [2f297b40a5] User login failed.web0072 [Fri Mar 04 16:27:51 2011] [error] [login] [2f297b40a5] User login failed. Re
  • 46. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] Invalid charset conveweb0102 [Fri Mar 04 16:27:48 2011] [warning] [login] [47dd608551] User login failed.web0012 [Fri Mar 04 16:27:48 2011] [warning] [login] [mk04gw1p71] User login failed.web0081 [Fri Mar 04 16:27:48 2011] [error] [register] [39e08e6692] Duplicate user IDweb0100 [Fri Mar 04 16:27:49 2011] [fatal] [register] [f9c2b23702] Invalid charset coweb0003 [Fri Mar 04 16:27:49 2011] [error] [register] [39e08e6692] Duplicate user IDweb0050 [Fri Mar 04 16:27:49 2011] [error] [register] [2e468a9bb6] Duplicate user IDweb0054 [Fri Mar 04 16:27:49 2011] [warning] [login] [mk04gw1p71] User login failed.web0200 [Fri Mar 04 16:27:49 2011] [error] [login] [f9c2b23702] User login failed. Reweb0064 [Fri Mar 04 16:27:49 2011] [error] [login] [47dd608551] Duplicate user ID encweb0012 [Fri Mar 04 16:27:49 2011] [warning] [login] [32976da59c] User login failed.web0041 [Fri Mar 04 16:27:49 2011] [fatal] [login] [mk04gw1p71] Invalid charset conveweb0012 [Fri Mar 04 16:27:49 2011] [error] [login] [2f297b40a5] User login failed. Reweb0025 [Fri Mar 04 16:27:49 2011] [warning] [register] [32976da59c] User login faileweb0088 [Fri Mar 04 16:27:49 2011] [warning] [register] [2e468a9bb6] User login faileweb0050 [Fri Mar 04 16:27:50 2011] [warning] [register] [39e08e6692] User login faileweb0035 [Fri Mar 04 16:27:50 2011] [warning] [login] [2f297b40a5] User login failed.web0072 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [2f297b40a5] User login failedweb0050 [Fri Mar 04 16:27:50 2011] [error] [login] [2e468a9bb6] User login failed. Reweb0054 [Fri Mar 04 16:27:50 2011] [warning] [login] [mk04gw1p71] User login failed.web0200 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [f9c2b23702] User login failedweb0064 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [47dd608551] Invalid charset cweb0012 [Fri Mar 04 16:27:50 2011] [warning] [login] [32976da59c] User login failed.web0041 [Fri Mar 04 16:27:50 2011] [fatal] [login] [mk04gw1p71] Invalid charset conveweb0012 [Fri Mar 04 16:27:50 2011] [error] [register] [2f297b40a5] Duplicate user IDweb0025 [Fri Mar 04 16:27:50 2011] [warning] [login] [32976da59c] User login failed.web0088 [Fri Mar 04 16:27:50 2011] [warning] [login] [2e468a9bb6] User login failed.web0050 [Fri Mar 04 16:27:51 2011] [warning] [login] [39e08e6692] User login failed.web0035 [Fri Mar 04 16:27:51 2011] [warning] [login] [2f297b40a5] User login failed.web0072 [Fri Mar 04 16:27:51 2011] [error] [login] [2f297b40a5] User login failed. Re
  • 47. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] Invalid charset conveweb0102 [Fri Mar 04 16:27:48 2011] [warning] [login] [47dd608551] User login failed.web0012 [Fri Mar 04 16:27:48 2011] [warning] [login] [mk04gw1p71] User login failed.web0081 [Fri Mar 04 16:27:48 2011] [error] [register] [39e08e6692] Duplicate user IDweb0100 [Fri Mar 04 16:27:49 2011] [fatal] [register] [f9c2b23702] Invalid charset coweb0003 [Fri Mar 04 16:27:49 2011] [error] [register] [39e08e6692] Duplicate user IDweb0050 [Fri Mar 04 16:27:49 2011] [error] [register] [2e468a9bb6] Duplicate user IDweb0054 [Fri Mar 04 16:27:49 2011] [warning] [login] [mk04gw1p71] User login failed.web0200 [Fri Mar 04 16:27:49 2011] [error] [login] [f9c2b23702] User login failed. Reweb0064 [Fri Mar 04 16:27:49 2011] [error] [login] [47dd608551] Duplicate user ID encweb0012 [Fri Mar 04 16:27:49 2011] [warning] [login] [32976da59c] User login failed.web0041 [Fri Mar 04 16:27:49 2011] [fatal] [login] [mk04gw1p71] Invalid charset conveweb0012 [Fri Mar 04 16:27:49 2011] [error] [login] [2f297b40a5] User login failed. Reweb0025 [Fri Mar 04 16:27:49 2011] [warning] [register] [32976da59c] User login faileweb0088 [Fri Mar 04 16:27:49 2011] [warning] [register] [2e468a9bb6] User login faileweb0050 [Fri Mar 04 16:27:50 2011] [warning] [register] [39e08e6692] User login faileweb0035 [Fri Mar 04 16:27:50 2011] [warning] [login] [2f297b40a5] User login failed.web0072 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [2f297b40a5] User login failedweb0050 [Fri Mar 04 16:27:50 2011] [error] [login] [2e468a9bb6] User login failed. Reweb0054 [Fri Mar 04 16:27:50 2011] [warning] [login] [mk04gw1p71] User login failed.web0200 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [f9c2b23702] User login failedweb0064 [Fri Mar 04 16:27:50 2011] [error] [subscribe] [47dd608551] Invalid charset cweb0012 [Fri Mar 04 16:27:50 2011] [warning] [login] [32976da59c] User login failed.web0041 [Fri Mar 04 16:27:50 2011] [fatal] [login] [mk04gw1p71] Invalid charset conveweb0012 [Fri Mar 04 16:27:50 2011] [error] [register] [2f297b40a5] Duplicate user IDweb0025 [Fri Mar 04 16:27:50 2011] [warning] [login] [32976da59c] User login failed.web0088 [Fri Mar 04 16:27:50 2011] [warning] [login] [2e468a9bb6] User login failed.web0050 [Fri Mar 04 16:27:51 2011] [warning] [login] [39e08e6692] User login failed.web0035 [Fri Mar 04 16:27:51 2011] [warning] [login] [2f297b40a5] User login failed.web0072 [Fri Mar 04 16:27:51 2011] [error] [login] [2f297b40a5] User login failed. ReFATALS ERRORS WARNINGSLogster
  • 48. github.com/etsy/logsterRunbycron(e.g.1mintervals)Keeps acursoronyourlogfileParseandaggregatevalueshoweveryouwantOutputtoGanglia,Graphite,AmazonCloudWatchSimpleparsersLogster
  • 49. web0054 [Fri Mar 04 16:27:48 2011] [error] [login][mk04gw1p71] User login failed. Reason: wrongpassword was submitted for mike@etsy.com^.+ [.+] [(?P<log_level>.+)]1.Patternmatchonfieldsofinterest
  • 50. if (fields[log_level] == “fatal”):self.fatals += 1elif (fields[log_level] == “error”):self.errors += 1elif (fields[log_level] == “warning”):self.warnings += 1...2.Aggregatevalues (sum,average,percentile,etc.)
  • 51. MetricObject("fatals",(self.fatals / self.duration), "per sec")MetricObject("errors",(self.errors / self.duration), "per sec")MetricObject("warning",(self.warnings / self.duration), "per sec")3.Sendthevaluesas“metricobjects”tothecollectors
  • 52. github.com/etsy/logsterFATALS ERRORS WARNINGSLogster
  • 53. StatsD
  • 54. github.com/etsy/statsdStatsDNetworkdaemon(node.js)AcceptsdataoverUDPFlushestoGraphiteevery10secOne-lineofcode
  • 55. StatsD::increment("logins.success");
  • 56. StatsD::increment("logins.success");Logins
  • 57. StatsD::timing("profile.time", $msec);
  • 58. StatsD::timing("profile.time", $msec);90thpctaveragelower
  • 59. Adhocname value timestamp
  • 60. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  • 61. VerticalLineTechnology!target=drawAsInfinite(events.deploy.site)
  • 62. User Logins
  • 63. PHP Warnings
  • 64. PHP Fatal Errors
  • 65. 250,000+metricsatEtsySystems,Applications,Business
  • 66. github.com/etsy/dashboardDashboards
  • 67. <a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"><img src="http://graphite.etsycorp.com/render?from=-1hours&width=280&height=220&title=File+or+Script+Not+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,%23ff0000,%23006633,%23cc6600"></a>KindofHard :-/github.com/etsy/dashboard
  • 68. $g = new Graphite($time);$g->setTitle(File Not Found);$g->addMetric(webs.errorLog.notExist, #00cc00);echo $g->getDashboardHTML(280, 220);SuperEasy!github.com/etsy/dashboard
  • 69. But,yousaid...“250,000+metricsatEtsy”Systems,Applications,Business
  • 70. http://graphite/render?from=-1hours&width=600&height=200&target=webs.errorLog.warning&rawData=1
  • 71. http://graphite/render?from=-1hours&width=600&height=200&target=webs.errorLog.warning&rawData=1webs.errorLog.warning,1318444930,1318448530,60|5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0,1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0,1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5.0,1.0,1.0,None
  • 72. Holt-WintersConfidenceBandslowerupper
  • 73. Holt-WintersAberration
  • 74. Businessmetrics+ Confidencebands_____________Alertablemetrics
  • 75. Metrics!Metrics+EventsMetrics+AlertsMetrics+Metrics
  • 76. High-level,real-timevisibility
  • 77. Detectproblemsearly,andresolvethemquickly.
  • 78. MakethemaccessibleMakethemrequiredfeaturesMakethemdeadsimple
  • 79. Merci!These slides will be available atmikebrittain.com/talkscodeascraft.etsy.comgithub.com/etsySay“Hello!”mike@etsy.com@mikebrittainMetrics-DrivenEngineering