SlideShare a Scribd company logo
Metrics-Driven Engineering

Mike Brittain        @ mikebrittain
Director of engineering, Infrastructure

                                          October 13, 2011
Tools and Process at Etsy
How many new visits?
  How many listings created?
  How many registrations?
How do people use Etsy?
  How many convos sent?
    How many purchases?
     How many new shops?
Search indexing?
     How fast are pages generating?
   Async tasks currently in queue?
What is the application doing?
 Developer API auth and rate limiting?
       Images resized and stored?
          Error and warning rates?
Replication slave lag?
       Memcache hits/misses?
       Available connections?
Are the servers in good shape ?
    Database queries per second?
       Total outgoing bandwidth?
            CPU, Memory, I/O?
Business Metrics
Application Metrics
System Metrics
Visibility EVERYWHERE
Constant Change
$314 Million GMS 2010
  $180 Million GMS 2009
  $87 Million GMS 2008

  $26 Million GMS 2007




credit: pentarux (flickr)
25 Million Unique Visitors
  1 Billion page views per month




credit: pentarux (flickr)
Engineering team grew 500%
                        over 18 months


credit: martin_heigan (flickr)
Less talk, more do.
Always Be Shipping



credit: ibailemon (flickr)
Always Be Shipping
                             (even if it’s your first day)




credit: ibailemon (flickr)
90+ Engineers
                     40+ Deploys / day

credit: misswired (flickr)
credit: digidave (flickr)
Code Reviews
Automated Tests
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
$cfg = array(
   'checkout' => array('enabled' => 'on'),
   'homepage' => array('enabled' => 'on'),
   'profiles' => array('enabled' => 'on'),
   'new_search' => array('enabled' => 'off'),
);


                          Config Flags
Enable and disable features quickly
Plus “admin-only,” percentage ramp-up, A/B testing,
whitelists, blacklists, etc...
Failure is not an option
inevitable!
Failure is not an option
inevitable!
Failure is not an option
            a learning opportunity!
inevitable!
Failure is not an option
            a learning opportunity!
     DETECTABLE!
Access
Detect problems quickly
CONFIDENCE
A:    Well, the Ops team manages the network, racks
     the servers, installed the monitoring tools, wears
                the pagers, blah, blah, blah...
Engineers build the application
Logging
      Graphing
OPS              ENG
      Trending
      Alerting
“Engineers are too busy writing
  features to build metrics.”
Metrics are part of every feature
        ...and so are config flags
Dead Simple
Simple, open source tools
Cacti (network, SNMP)
Ganglia (machines)
Graphite (application)
Splunk (log analysis, nightly reports)
Nagios (alerting)
                             Logging
                             Logster
                               StatsD
Ganglia
Ganglia
Cluster-oriented
Huge community contributed recipes
Custom metrics (gmetad)
Graphite
Graphite
                            Single-instance
              Create new metrics on-the-fly
   Customize via URLs and display functions
Logging
It’s 2:48 PM.
Do you know where your
       logs are?
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
Logger::log_error("User login failed.
Reason: $msg for $username", “login”);
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
LogFormat "%h %l %u %t "%r" %>s %b"
                common
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
apache_note()
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
LogFormat %{True-Client-IP}i %l %t "%r
         " %>s %b "%{Referer}i"
              "%{User-Agent}i"
    %{etsy_shop_id}n %{etsy_uaid}n %V
           %{etsy_ab_selections}n
            %{etsy_request_uuid}n
         %{etsy_api_consumer_key}n
          %{etsy_api_method_name}n
        %{php_memory_usage_bytes}n
   %{php_time_microsec}n %D" combined
grep "/listing/" access.log | 
awk '{sum=sum+$(NF-2)} END {print sum/NR}'
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Help me, Rhonda.
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0001   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0201   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0034   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web1101   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0201   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0055   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
web0002   [04:28:54   2011]   [warning] [client 10.101.x.x] Sky is falling.
web0089   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0020   [04:28:54   2011]   [error] [client 10.101.x.x] Sky is falling.
web1101   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0055   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0001   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0034   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0087   [04:28:54   2011]   [fatal] [client 10.101.x.x] Sky is falling.
web0002   [04:28:54   2011]   [error] [client 10.101.x.x] Oh noooooo!
web0201   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!
web0077   [04:28:54   2011]   [warning] [client 10.101.x.x] Gaaaaahhh!
web0355   [04:28:54   2011]   [warning] [client 10.101.x.x] Oh nooooooooooo
web0052   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0001   [04:28:54   2011]   [error] [client 10.101.x.x] Gaaaaahhh!!!
web0003   [04:28:54   2011]   [error] [client 10.101.x.x] You've been eaten by a grue.
web0066   [04:28:54   2011]   [fatal] [client 10.101.x.x] Gaaaaahhh!!!
Logster
Fatals       Errors   Warnings
Logster
Run by cron
Keeps a cursor on your log file
Aggregate lines anyway you want
Output to Ganglia or Graphite
Simple parsers
                                  github.com/etsy
web0054 [Fri Mar 04 16:27:48 2011]
[error] [login] [mk04gw1p71] User login
 failed. Reason: wrong password for ...
^.+ [.+] [(?P<log_level>.+)]
if (fields['log_level'] == “fatal”):
   self.fatals += 1

elif (fields['log_level'] == “error”):
   self.errors += 1

elif (fields['log_level'] == “warning”):
   self.warnings += 1

...
MetricObject("fatals",
  (self.fatals / self.duration), "per sec")

MetricObject("errors",
  (self.errors / self.duration), "per sec")

MetricObject("warning",
  (self.warnings / self.duration), "per sec")
Fatals   Errors   Warnings
StatsD
StatsD
                           Network daemon (node.js)
                               Accepts data over UDP
                      Flushes to Graphite every 10 sec
                                     One-line of code
github.com/etsy
StatsD::increment("logins.success");
StatsD::increment("logins.success");




                                  logins
StatsD::timing("gearman.time", $msec);
StatsD::timing("gearman.time", $msec);



                                 90th pct

                                 average

                                 lower
Ad hoc
name value timestamp
echo "events.deploy.site 1 `date +%s`" 
     | nc graphite.etsycorp.com 2003
Vertical Line Technology!
target=drawAsInfinite(events.deploy.site)
We could stare at graphs all day...
http://graphite/render?
   from=-1hours&width=600&height=200
&target=webs.errorLog.warning&rawData=1
http://graphite/render?
       from=-1hours&width=600&height=200
    &target=webs.errorLog.warning&rawData=1

webs.errorLog.warning,1318444930,1318448530,60|
5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0,
1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0,
1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5.
0,1.0,1.0,None
Holt-Winters Confidence Bands

upper

         lower
Holt-Winters Aberration
Business metrics
 + Confidence bands
_____________
    Alertable metrics
40,000+ metrics at Etsy
  Systems, Applications, Business
Dashboards
Dashboards
Kind of Hard :-/
<a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or
+Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
     <img src="http://graphite.etsycorp.com/render?
from=-1hours&width=280&height=220&title=File+or+Script+Not
+Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite
%28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production
%29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite
%28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff,
%23ff0000,%23006633,%23cc6600">
</a>
Super Easy!
$g = new Graphite($time);
$g->setTitle('File Not Found');
$g->addMetric('webs.errorLog.notExist', '#00cc00');
echo $g->getDashboardHTML(280, 220);
Metrics!
Metrics!
Metrics + Events
Metrics!
Metrics + Events
Metrics + Alerts
Metrics!
Metrics + Events
Metrics + Alerts
Metrics + Metrics
High-level, real-time visibility
Detect problems quickly
CONFIDENCE
Make them required features
Make them dead simple
Make them accessible
Make them!
Homework
codeascraft.etsy.com
github.com/etsy                      Get in touch
                                     mike @ etsy . com
We’re always looking for people         @ mikebrittain
who are interested in this kind of
stuff...



Thank You
etsy.com/careers
Metrics-Driven Engineering

More Related Content

What's hot

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Iakiv Kramarenko
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
Michelangelo van Dam
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Codemotion
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NET
Ben Hall
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with React
FITC
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with Karma
ExoLeaders.com
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - EN
Iakiv Kramarenko
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
apidays
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moon
davejohnson
 
KISS Automation.py
KISS Automation.pyKISS Automation.py
KISS Automation.py
Iakiv Kramarenko
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012
Nicholas Zakas
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybara
Iakiv Kramarenko
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to come
Pablo Enfedaque
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
apidays
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnit
Peter Wilcsinszky
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajax
davejohnson
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
Steve Wallin
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQuery
howlowck
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13
Stephan Hochdörfer
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010
Stephan Hochdörfer
 

What's hot (20)

Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
Selenide alternative in Python - Introducing Selene [SeleniumCamp 2016]
 
Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012Quality Assurance for PHP projects - ZendCon 2012
Quality Assurance for PHP projects - ZendCon 2012
 
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
Universal JavaScript Web Applications with React - Luciano Mammino - Codemoti...
 
Testing ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NETTesting ASP.NET - Progressive.NET
Testing ASP.NET - Progressive.NET
 
A Journey with React
A Journey with ReactA Journey with React
A Journey with React
 
Good karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with KarmaGood karma: UX Patterns and Unit Testing in Angular with Karma
Good karma: UX Patterns and Unit Testing in Angular with Karma
 
You do not need automation engineer - Sqa Days - 2015 - EN
You do not need automation engineer  - Sqa Days - 2015 - ENYou do not need automation engineer  - Sqa Days - 2015 - EN
You do not need automation engineer - Sqa Days - 2015 - EN
 
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
APIdays Helsinki 2019 - Specification-Driven Development of REST APIs with Al...
 
Ajax to the Moon
Ajax to the MoonAjax to the Moon
Ajax to the Moon
 
KISS Automation.py
KISS Automation.pyKISS Automation.py
KISS Automation.py
 
Maintainable JavaScript 2012
Maintainable JavaScript 2012Maintainable JavaScript 2012
Maintainable JavaScript 2012
 
Web ui tests examples with selenide, nselene, selene & capybara
Web ui tests examples with  selenide, nselene, selene & capybaraWeb ui tests examples with  selenide, nselene, selene & capybara
Web ui tests examples with selenide, nselene, selene & capybara
 
Python: the coolest is yet to come
Python: the coolest is yet to comePython: the coolest is yet to come
Python: the coolest is yet to come
 
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
APIdays Helsinki 2019 - API Versioning with REST, JSON and Swagger with Thoma...
 
Testing persistence in PHP with DbUnit
Testing persistence in PHP with DbUnitTesting persistence in PHP with DbUnit
Testing persistence in PHP with DbUnit
 
Pragmatics of Declarative Ajax
Pragmatics of Declarative AjaxPragmatics of Declarative Ajax
Pragmatics of Declarative Ajax
 
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.jsJavaOne 2016 -Emerging Web App Architectures using Java and node.js
JavaOne 2016 -Emerging Web App Architectures using Java and node.js
 
Ditching JQuery
Ditching JQueryDitching JQuery
Ditching JQuery
 
Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13Phing for power users - dpc_uncon13
Phing for power users - dpc_uncon13
 
Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010Testing untestable Code - PFCongres 2010
Testing untestable Code - PFCongres 2010
 

Viewers also liked

Scaling Deployment at Etsy
Scaling Deployment at EtsyScaling Deployment at Etsy
Scaling Deployment at Etsy
Daniel Schauenberg
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
Mike Brittain
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYC
Mike Brittain
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
Matt Graham
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
Mike Brittain
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
Mike Brittain
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
Mike Brittain
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
Daniel Schauenberg
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
Mike Brittain
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building Teams
Mike Brittain
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
Ross Snyder
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2
Paul Adams
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EE
Docker, Inc.
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
Mike Brittain
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
Brian Solis
 

Viewers also liked (15)

Scaling Deployment at Etsy
Scaling Deployment at EtsyScaling Deployment at Etsy
Scaling Deployment at Etsy
 
How to Get to Second Base with Your CDN
How to Get to Second Base with Your CDNHow to Get to Second Base with Your CDN
How to Get to Second Base with Your CDN
 
Continuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYCContinuous Deployment at Etsy — TimesOpen NYC
Continuous Deployment at Etsy — TimesOpen NYC
 
Migrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without DowntimeMigrating from PostgreSQL to MySQL Without Downtime
Migrating from PostgreSQL to MySQL Without Downtime
 
Continuous Deployment: The Dirty Details
Continuous Deployment: The Dirty DetailsContinuous Deployment: The Dirty Details
Continuous Deployment: The Dirty Details
 
Simple Log Analysis and Trending
Simple Log Analysis and TrendingSimple Log Analysis and Trending
Simple Log Analysis and Trending
 
On Failure and Resilience
On Failure and ResilienceOn Failure and Resilience
On Failure and Resilience
 
A Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring StackA Whirlwind Tour of Etsy's Monitoring Stack
A Whirlwind Tour of Etsy's Monitoring Stack
 
Continuous Delivery: The Dirty Details
Continuous Delivery: The Dirty DetailsContinuous Delivery: The Dirty Details
Continuous Delivery: The Dirty Details
 
From Building a Marketplace to Building Teams
From Building a Marketplace to Building TeamsFrom Building a Marketplace to Building Teams
From Building a Marketplace to Building Teams
 
Scaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went RightScaling Etsy: What Went Wrong, What Went Right
Scaling Etsy: What Went Wrong, What Went Right
 
The Real Life Social Network v2
The Real Life Social Network v2The Real Life Social Network v2
The Real Life Social Network v2
 
Docker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EEDocker Online Meetup: Announcing Docker CE + EE
Docker Online Meetup: Announcing Docker CE + EE
 
Principles and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at EtsyPrinciples and Practices in Continuous Deployment at Etsy
Principles and Practices in Continuous Deployment at Etsy
 
26 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 201826 Disruptive & Technology Trends 2016 - 2018
26 Disruptive & Technology Trends 2016 - 2018
 

Similar to Metrics-Driven Engineering

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
Stefan Krawczyk
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software Design
Patrick McKenzie
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011
Chris Alfano
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
Mustafa TURAN
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
Konrad Kokosa
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009
Robbie Cheng
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active Cache
Ching Yi Chan
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Libraries
jeresig
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
Graham Dumpleton
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyond
dion
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3
masahiroookubo
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for Translation
Brian Hogg
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
Maarten Balliauw
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Fastly
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
LogeekNightUkraine
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePoint
Geoff Varosky
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture
Neo4j
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translation
wcto2017
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
Fwdays
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
Simon Dittlmann
 

Similar to Metrics-Driven Engineering (20)

Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Data-Driven Software Design
Data-Driven Software DesignData-Driven Software Design
Data-Driven Software Design
 
Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011Jarv.us Showcase — SenchaCon 2011
Jarv.us Showcase — SenchaCon 2011
 
Re-Design with Elixir/OTP
Re-Design with Elixir/OTPRe-Design with Elixir/OTP
Re-Design with Elixir/OTP
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Open Source Ajax Solution @OSDC.tw 2009
Open Source Ajax  Solution @OSDC.tw 2009Open Source Ajax  Solution @OSDC.tw 2009
Open Source Ajax Solution @OSDC.tw 2009
 
idea: talk about the Active Cache
idea: talk about the Active Cacheidea: talk about the Active Cache
idea: talk about the Active Cache
 
More Secrets of JavaScript Libraries
More Secrets of JavaScript LibrariesMore Secrets of JavaScript Libraries
More Secrets of JavaScript Libraries
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Google Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and BeyondGoogle Back To Front: From Gears to App Engine and Beyond
Google Back To Front: From Gears to App Engine and Beyond
 
Implementation of GUI Framework part3
Implementation of GUI Framework part3Implementation of GUI Framework part3
Implementation of GUI Framework part3
 
Preparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for TranslationPreparing a WordPress Plugin for Translation
Preparing a WordPress Plugin for Translation
 
What is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
 
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, EverAltitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
Altitude NY 2018: Leveraging Log Streaming to Build the Best Dashboards, Ever
 
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
Kostiantyn Yelisavenko "Mastering Macro Benchmarking in .NET"
 
Introduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePointIntroduction To Developing Custom Actions Within SharePoint
Introduction To Developing Custom Actions Within SharePoint
 
Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture Introducing Neo4j 3.1: New Security and Clustering Architecture
Introducing Neo4j 3.1: New Security and Clustering Architecture
 
Brian hogg word camp preparing a plugin for translation
Brian hogg   word camp preparing a plugin for translationBrian hogg   word camp preparing a plugin for translation
Brian hogg word camp preparing a plugin for translation
 
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac..."Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
"Full Stack frameworks or a story about how to reconcile Front (good) and Bac...
 
Do we need a bigger dev data culture
Do we need a bigger dev data cultureDo we need a bigger dev data culture
Do we need a bigger dev data culture
 

Recently uploaded

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

Metrics-Driven Engineering

  • 1. Metrics-Driven Engineering Mike Brittain @ mikebrittain Director of engineering, Infrastructure October 13, 2011
  • 3. How many new visits? How many listings created? How many registrations? How do people use Etsy? How many convos sent? How many purchases? How many new shops?
  • 4. Search indexing? How fast are pages generating? Async tasks currently in queue? What is the application doing? Developer API auth and rate limiting? Images resized and stored? Error and warning rates?
  • 5. Replication slave lag? Memcache hits/misses? Available connections? Are the servers in good shape ? Database queries per second? Total outgoing bandwidth? CPU, Memory, I/O?
  • 11.
  • 12. $314 Million GMS 2010 $180 Million GMS 2009 $87 Million GMS 2008 $26 Million GMS 2007 credit: pentarux (flickr)
  • 13. 25 Million Unique Visitors 1 Billion page views per month credit: pentarux (flickr)
  • 14. Engineering team grew 500% over 18 months credit: martin_heigan (flickr)
  • 16. Always Be Shipping credit: ibailemon (flickr)
  • 17. Always Be Shipping (even if it’s your first day) credit: ibailemon (flickr)
  • 18.
  • 19. 90+ Engineers 40+ Deploys / day credit: misswired (flickr)
  • 23. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly
  • 24. $cfg = array( 'checkout' => array('enabled' => 'on'), 'homepage' => array('enabled' => 'on'), 'profiles' => array('enabled' => 'on'), 'new_search' => array('enabled' => 'off'), ); Config Flags Enable and disable features quickly Plus “admin-only,” percentage ramp-up, A/B testing, whitelists, blacklists, etc...
  • 25. Failure is not an option
  • 27. inevitable! Failure is not an option a learning opportunity!
  • 28. inevitable! Failure is not an option a learning opportunity! DETECTABLE!
  • 30.
  • 31.
  • 32.
  • 35.
  • 36. A: Well, the Ops team manages the network, racks the servers, installed the monitoring tools, wears the pagers, blah, blah, blah...
  • 37. Engineers build the application
  • 38. Logging Graphing OPS ENG Trending Alerting
  • 39. “Engineers are too busy writing features to build metrics.”
  • 40. Metrics are part of every feature ...and so are config flags
  • 43. Cacti (network, SNMP) Ganglia (machines) Graphite (application) Splunk (log analysis, nightly reports) Nagios (alerting) Logging Logster StatsD
  • 45. Ganglia Cluster-oriented Huge community contributed recipes Custom metrics (gmetad)
  • 47. Graphite Single-instance Create new metrics on-the-fly Customize via URLs and display functions
  • 49. It’s 2:48 PM. Do you know where your logs are?
  • 50. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 51. Logger::log_error("User login failed. Reason: $msg for $username", “login”);
  • 52. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 53. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 54. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 55. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 56. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 57. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 58. LogFormat "%h %l %u %t "%r" %>s %b" common
  • 59. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 61. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 62. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 63. LogFormat %{True-Client-IP}i %l %t "%r " %>s %b "%{Referer}i" "%{User-Agent}i" %{etsy_shop_id}n %{etsy_uaid}n %V %{etsy_ab_selections}n %{etsy_request_uuid}n %{etsy_api_consumer_key}n %{etsy_api_method_name}n %{php_memory_usage_bytes}n %{php_time_microsec}n %D" combined
  • 64. grep "/listing/" access.log | awk '{sum=sum+$(NF-2)} END {print sum/NR}'
  • 65. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Help me, Rhonda. web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Heeeeeeellllllllllllllppppp! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0001 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0201 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0034 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web1101 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0201 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0055 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!! web0002 [04:28:54 2011] [warning] [client 10.101.x.x] Sky is falling. web0089 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0020 [04:28:54 2011] [error] [client 10.101.x.x] Sky is falling. web1101 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0055 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0001 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0034 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0087 [04:28:54 2011] [fatal] [client 10.101.x.x] Sky is falling. web0002 [04:28:54 2011] [error] [client 10.101.x.x] Oh noooooo! web0201 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh! web0077 [04:28:54 2011] [warning] [client 10.101.x.x] Gaaaaahhh! web0355 [04:28:54 2011] [warning] [client 10.101.x.x] Oh nooooooooooo web0052 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0001 [04:28:54 2011] [error] [client 10.101.x.x] Gaaaaahhh!!! web0003 [04:28:54 2011] [error] [client 10.101.x.x] You've been eaten by a grue. web0066 [04:28:54 2011] [fatal] [client 10.101.x.x] Gaaaaahhh!!!
  • 66. Logster Fatals Errors Warnings
  • 67. Logster Run by cron Keeps a cursor on your log file Aggregate lines anyway you want Output to Ganglia or Graphite Simple parsers github.com/etsy
  • 68. web0054 [Fri Mar 04 16:27:48 2011] [error] [login] [mk04gw1p71] User login failed. Reason: wrong password for ...
  • 70. if (fields['log_level'] == “fatal”): self.fatals += 1 elif (fields['log_level'] == “error”): self.errors += 1 elif (fields['log_level'] == “warning”): self.warnings += 1 ...
  • 71. MetricObject("fatals", (self.fatals / self.duration), "per sec") MetricObject("errors", (self.errors / self.duration), "per sec") MetricObject("warning", (self.warnings / self.duration), "per sec")
  • 72. Fatals Errors Warnings
  • 74. StatsD Network daemon (node.js) Accepts data over UDP Flushes to Graphite every 10 sec One-line of code github.com/etsy
  • 78. StatsD::timing("gearman.time", $msec); 90th pct average lower
  • 79. Ad hoc name value timestamp
  • 80. echo "events.deploy.site 1 `date +%s`" | nc graphite.etsycorp.com 2003
  • 82.
  • 83. We could stare at graphs all day...
  • 84. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1
  • 85. http://graphite/render? from=-1hours&width=600&height=200 &target=webs.errorLog.warning&rawData=1 webs.errorLog.warning,1318444930,1318448530,60| 5.0,1.0,3.0,1.0,0.0,9.0,0.0,1.0,3.0,2.0,1.0,6.0,2.0,6.0,3.0,6.0,4.0,4.0,2.0, 1.0,1.0,8.0,2.0,3.0,6.0,3.0,5.0,3.0,0.0,4.0,6.0,2.0,0.0,2.0,0.0,4.0,0.0,3.0, 1.0,3.0,4.0,2.0,10.0,3.0,0.0,6.0,0.0,4.0,2.0,5.0,18.0,1.0,1.0,2.0,1.0,8.0,5. 0,1.0,1.0,None
  • 88. Business metrics + Confidence bands _____________ Alertable metrics
  • 89. 40,000+ metrics at Etsy Systems, Applications, Business
  • 92. Kind of Hard :-/ <a href="http://graphite.etsycorp.com/render?from=-1hours&width=800&height=600&title=File+or +Script+Not+Found&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> <img src="http://graphite.etsycorp.com/render? from=-1hours&width=280&height=220&title=File+or+Script+Not +Found&hideLegend=1&yMin=0&target=webs.errorLog.notExist&target=drawAsInfinite %28deploys.config.production%29&target=drawAsInfinite%28deploys.web.production %29&target=drawAsInfinite%28deploys.search.production%29&target=drawAsInfinite %28deploys.imagestorage.other%29&colorList=%2300cc00,%230000ff, %23ff0000,%23006633,%23cc6600"> </a>
  • 93. Super Easy! $g = new Graphite($time); $g->setTitle('File Not Found'); $g->addMetric('webs.errorLog.notExist', '#00cc00'); echo $g->getDashboardHTML(280, 220);
  • 97. Metrics! Metrics + Events Metrics + Alerts Metrics + Metrics
  • 101. Make them required features
  • 102. Make them dead simple
  • 105. Homework codeascraft.etsy.com github.com/etsy Get in touch mike @ etsy . com We’re always looking for people @ mikebrittain who are interested in this kind of stuff... Thank You etsy.com/careers