• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Metrics driven engineering (velocity 2011)
 

Metrics driven engineering (velocity 2011)

on

  • 2,056 views

 

Statistics

Views

Total Views
2,056
Views on SlideShare
2,051
Embed Views
5

Actions

Likes
6
Downloads
32
Comments
0

2 Embeds 5

http://lanyrd.com 4
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Metrics driven engineering (velocity 2011) Metrics driven engineering (velocity 2011) Presentation Transcript

    • METRICS-DRIVEN ENGINEERING at Kellan Elliott-McCrea, VP of Eng. kellan@etsy.com @kellanTuesday, June 5, 12
    • Tuesday, June 5, 12
    • Tuesday, June 5, 12
    • What is Etsy?Tuesday, June 5, 12
    • 8.5+ million items in the marketplaceTuesday, June 5, 12
    • 400,000+ activeTuesday, June 5, 12
    • $300+ million in sales in 2010 ~$41 million/monthTuesday, June 5, 12
    • > $1000 / minuteTuesday, June 5, 12
    • > 1 billion page views / monthTuesday, June 5, 12
    • business in over 150 countriesTuesday, June 5, 12
    • deploy the site, every ~20 minutesTuesday, June 5, 12
    • engineering team grew ~4x in 2010Tuesday, June 5, 12
    • Metrics?Tuesday, June 5, 12
    • Logs, Graphs, Trends, and CorrelationsTuesday, June 5, 12
    • Metrics Driven?Tuesday, June 5, 12
    • Making DecisionsTuesday, June 5, 12
    • How many visitors are using this thing?Tuesday, June 5, 12
    • Can we deploy that to 100% of our visitors?Tuesday, June 5, 12
    • Did we make it faster?Tuesday, June 5, 12
    • Did I just break something?Tuesday, June 5, 12
    • Q. WHO MAKES THESE GRAPHS? A. Well,racksOps team manages thethe network, the the servers, installed monitoring tools, wears the pagers, blah, blah, blah...Tuesday, June 5, 12
    • but... Engineers build the application.Tuesday, June 5, 12
    • Dev + OpsTuesday, June 5, 12
    • ACCESSTuesday, June 5, 12
    • Yes! No.Tuesday, June 5, 12
    • “Engineers are too busy!”Tuesday, June 5, 12
    • Here’s the BIG SECRET...Tuesday, June 5, 12
    • ... MAKE IT EASY!Tuesday, June 5, 12
    • Simple, open source toolsTuesday, June 5, 12
    • Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting)Tuesday, June 5, 12
    • Gan ★cluster oriented ★huge community contributed recipes ★2.0 released today (including several Flickr and Etsy patches!) ★gmetad makes it easy to track custom metricsTuesday, June 5, 12
    • Tuesday, June 5, 12
    • Graphite ★super flexible collection and display ★per metrics buckets ★single instance ★super easy to write and use custom display functionsTuesday, June 5, 12
    • LoggingTuesday, June 5, 12
    • Logger::log_error("User login failed. Reason: $msg for $username", “login”);Tuesday, June 5, 12
    • web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
    • web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
    • web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
    • web0054 [Fri Mar 04 16:27:48 2011] [info] [login] [14531658] User login failed. Reason: wrong password for ...Tuesday, June 5, 12
    • Counting and Timing http://code.flickr.com/blog/ 2008/10/27/counting-timing/Tuesday, June 5, 12
    • LogsterTuesday, June 5, 12
    • Logster https://github.com/etsy/logsterTuesday, June 5, 12
    • Forked from ganglia-logtailer : - Daemon mode (only cron mode) + Support for Graphite + Simplified parsing scriptsTuesday, June 5, 12
    • web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] Youve been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is fallingTuesday, June 5, 12
    • Fatals Errors WarningsTuesday, June 5, 12
    • ★runs out of cron ★maintains a cursor into log files ★supports ganglia and graphite ★custom parsers much easier to write then gmetadTuesday, June 5, 12
    • Apache access logsTuesday, June 5, 12
    • LogFormat "%h %l %u %t "%r" %>s %b" commonTuesday, June 5, 12
    • LogFormat "%{X-Forwarded-For}i % {True-Client-IP}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" % {etsy_shop_id}n %{etsy_uaid}n %V % {etsy_ab_selections}n % {etsy_request_uuid}n % {etsy_api_consumer_key}n % {etsy_api_method_name}n % {php_memory_usage_bytes}n % {php_time_microsec}n %D" combinedTuesday, June 5, 12
    • %{etsy_ab_selections}nTuesday, June 5, 12
    • %{etsy_uaid}nTuesday, June 5, 12
    • GraphsTuesday, June 5, 12
    • “If Engineering at Etsy has a religion, it’s the Church of Graphs. If it moves, we track it.” - Erik Kastner http://codeascraft.etsy.com/2011/02/15/measure- anything-measure-everything/Tuesday, June 5, 12
    • Tuesday, June 5, 12
    • StatsDTuesday, June 5, 12
    • StatsD https://github.com/ etsy/statsd/Tuesday, June 5, 12
    • StatsD::increment("logins.success"); StatsD::timing("gearman.time", $msec);Tuesday, June 5, 12
    • 90th pct average lower StatsD::timing("gearman.time", $msec);Tuesday, June 5, 12
    • Ad hoc name value timestampTuesday, June 5, 12
    • echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003Tuesday, June 5, 12
    • CorrelationsTuesday, June 5, 12
    • echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003Tuesday, June 5, 12
    • Trends + Events target=drawAsInfinite(events.deploy.site)Tuesday, June 5, 12
    • What Happened?Tuesday, June 5, 12
    • Holt-WintersTuesday, June 5, 12
    • "Forecasting Sales by Exponentially Weighted Moving Averages". PeterTuesday, June 5, 12
    • "Aberrant Behavior Detection in Time Series for Network Monitoring".Tuesday, June 5, 12
    • "Holt-Winters Forecasting Applied to Poisson Processes in Real-Time".Tuesday, June 5, 12
    • holtWintersConfidence(Upper|Lower)Tuesday, June 5, 12
    • holtWintersAberrationTuesday, June 5, 12
    • business metrics with confidence bands == alertable business metricsTuesday, June 5, 12
    • 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA)Tuesday, June 5, 12
    • 16,000 metrics in GRAPHITE (plus 32,000 metrics in GANGLIA)Tuesday, June 5, 12
    • DashboardsTuesday, June 5, 12
    • DashboardsTuesday, June 5, 12
    • DashboardsTuesday, June 5, 12
    • Hard <a href="http://graphite.etsycorp.com/render? from=-1hours&width=800&height=600&title=File+or+Script+Not +Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a>Tuesday, June 5, 12
    • Easy! $g = new Graphite($time); $g->setTitle(File Not Found); $g->addMetric(webs.errorLog.notExist, #00cc00); $g->showDeploys(true); echo $g->getDashboardHTML(280, 220);Tuesday, June 5, 12
    • 48 dashboards by 32 engineersTuesday, June 5, 12
    • Application healthTuesday, June 5, 12
    • High-level visibilityTuesday, June 5, 12
    • Low MTTDTuesday, June 5, 12
    • ConfidenceTuesday, June 5, 12
    • Make metricsTuesday, June 5, 12
    • Make metricsTuesday, June 5, 12
    • Make metricsTuesday, June 5, 12
    • Not that muchTuesday, June 5, 12
    • codeascraft.etsy.com github.com/etsy/statsd github.com/etsy/logster bitbucket.org/maplebed/ganglia- logtailerTuesday, June 5, 12
    • Questions?Tuesday, June 5, 12