Your SlideShare is downloading. ×
0
Integrating Multiple CDN Providers
Our experiences at Etsy

@lozzd • @ickymettle
Marcus Barczak

Laurie Denness

Staff Operations Engineers
@lozzd • @ickymettle
@lozzd • @ickymettle
Beginning of 2010

Today
@lozzd • @ickymettle
Background
▪ First started using a single CDN in 2008
▪ Exponential Growth
▪ Start of 2012 began investigation into runnin...
Why use a CDN?
▪ Goal: Consistently fast user experience globally
▪ Improve last mile performance by caching content

clos...
Why use more than one CDN?
▪ Resilience
-

Eliminate single point of failure

▪ Flexibility
-

Balance traffic based on bu...
The Plan

http://www.flickr.com/photos/malloy/195204215
The Plan
1. Establish evaluation criteria
2. Initial configuration and testing
3. Test with production traffic
4. Operatio...
Evaluation Criteria

@lozzd • @ickymettle
http://www.flickr.com/photos/49212595@N00/5646403386
Evaluation Criteria
▪ Performance
▪ Configuration
▪ Reporting, Metrics and Logging
▪ Culture

@lozzd • @ickymettle
Performance
▪ Baseline Response Times
-

Should be within ±5% of our existing CDN provider’s
response times

▪ Hit Ratios ...
Configuration
▪ Complexity
-

how complex is the providers configuration system

▪ Self service
-

can you make changes di...
Reporting, Metrics and Logging
▪ Resolution
▪ Latency
▪ Delivery
▪ Customisation

@lozzd • @ickymettle
Culture
▪ Understand our culture
▪ Postmortems
▪ Access to technical staff
▪ Shared success

@lozzd • @ickymettle
Initial
Configuration
and Testing

http://www.flickr.com/photos/7269902@N07/4592239326
Clean the house
http://www.flickr.com/photos/mastergeorge/8562623590
Clean the house
▪ Managing caching TTLs from origin
-

CDNs honour the origin cache-control headers!

<LocationMatch ".(gi...
Clean the house
▪ Manage gzip compression from origin
-

Honoured by CDNs

-

Compression from origin to CDN

## mod_defla...
Clean the house
If you can do it at origin,
do it at origin

@lozzd • @ickymettle
Mean Time To Curl
http://www.flickr.com/photos/wwarby/3297205226
curl -i -H 'Host: img0.etsystatic.com' 
global-ssl.fastly.net/someimage.jpg
HTTP/1.1 200 OK
Server: Apache
Last-Modified: ...
curl -i -H 'Host: img0.etsystatic.com' 
global-ssl.fastly.net/someimage.jpg
HTTP/1.1 200 OK
Server: Apache
Last-Modified: ...
Mean Time To Curl = Done
https://www.etsy.com/listing/99871278
Mean Time To Curl
▪ No need to touch existing infrastructure
▪ Smoke test of functionality
▪ 10 minutes from configuration...
Testing In Production
http://www.flickr.com/photos/solarnu/10646426865
Testing with Production Traffic
▪ Images only at first
▪ Good test of caching performance
▪ Easy to test by swapping hostn...
A/B Test Framework
▪ Fine grained control
▪ Enable test for specific users or groups
▪ Percentage of users
▪ All controlle...
Configure Mappings to CDNs
$server_config["image"] = array(
'akamai' => array(
'img0-ak.etsystatic.com',
'img1-ak.etsystat...
Test Controls
$server_config['ab']['cdn'] = array(
'enabled' => 'on',
'weights' => array(
'akamai'
=> 0.0,
'edgecast' => 0...
Metrics and Monitoring

@lozzd • @ickymettle
http://www.flickr.com/photos/nicolasfleury/6073151084
Metrics and Monitoring

Even if it doesn’t move, graph it anyway
@lozzd • @ickymettle
Metrics and Monitoring
Simplest approach: Provider’s dashboards

@lozzd • @ickymettle
Metrics and Monitoring
▪ Get more detail by pulling metrics in house
▪ Write script to pull data from API
▪ Create dashboa...
Metrics and Monitoring
▪ Get more detail by pulling metrics in house
▪ Write script to pull data from API
▪ Create dashboa...
Metrics and Monitoring

@lozzd • @ickymettle
Metrics and Monitoring

@lozzd • @ickymettle
Testing Plan
1. for c in $cdns; do rampup $c; done;
2. Deliberately slow and steady
3. Watch traffic increase
4. Watch ori...
Downsides of this approach
▪ AB testing can’t be used for main site
▪ Exposing your test CNAMEs
▪ Especially if hotlinking...
Downsides of this approach
▪ Exposing your test CNAMEs
▪ Especially if hotlinking is a concern

@lozzd • @ickymettle
How do you know it’s broke?
▪ Check the graphs!
▪ Check with your community
▪ Keep support in the loop

@lozzd • @ickymett...
Operationalising

http://www.flickr.com/photos/98047351@N05/9706165200
Content Partitioning

@lozzd • @ickymettle
Etsy’s site partitioning
Dynamic HTML Content
www.etsy.com

@lozzd • @ickymettle
Etsy’s site partitioning

Static Assets (js, css, fonts)
site.etsystatic.com

@lozzd • @ickymettle
Etsy’s site partitioning
Listing Images, Avatars
imgX.etsystatic.com

@lozzd • @ickymettle
Etsy’s site partitioning
Dynamic HTML Content
www.etsy.com
Static Assets (js, css, fonts)
site.etsystatic.com
Listing Imag...
Balancing Traffic in
Production

http://www.flickr.com/photos/wok_design/2499217405
Balancing Traffic Using DNS
▪ Traffic Manager
▪ Extends DNS to dynamically return records based

on rules
▪ Weighted round...
Balancing Traffic Using DNS
[2589:~] $ dig +short www.etsy.com
www.etsy.com.edgekey.net.
e2463.b.akamaiedge.net.
23.74.122...
Balancing Traffic Using DNS
[2589:~] $ dig +short www.etsy.com
etsy.com.
[2589:~] $ dig +short www.etsy.com
38.123.123.123...
Balancing Traffic Using DNS
▪ Rule updates typically made via web UI
▪ Can be slow and error prone
▪ Changes need to be ap...
cdncontrol

@lozzd • @ickymettle
http://www.flickr.com/photos/foshydog/4441105829
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
cdncontrol

@lozzd • @ickymettle
DNS balancing downsides
▪ Low TTLs for fast convergence
▪ Mo QPS == Mo Money
▪ More DNS lookups for users
▪ Not 100% insta...
50% within 1
minute
Long Tail is Loooong

@lozzd • @ickymettle
Monitoring in Production
@lozzd • @ickymettle
http://www.flickr.com/photos/9229426@N05/5160787240
Whoopsie Page
▪ Static HTML delivered for 5xx errors
-

Branding

-

Translated error messages

-

Links to status page

@...
Whoopsie Page
▪ Static HTML delivered for 5xx errors
-

Branding

-

Translated error messages

-

Links to status page

@...
Failure Beacons
1. 1x1 tracking pixel embedded in page
[...]
<img src="//failure.etsy.com/status/images/beacon.gif?
beacon...
Failure Beacons
1. 1x1 tracking pixel embedded in page
2. Request creates an access log line

@lozzd • @ickymettle
Failure Beacons
1. 1x1 tracking pixel embedded in page
2. Request creates an access log line
3. Scrape them out minutely u...
Failure Beacons
1. 1x1 tracking pixel embedded in page
2. Request creates an access log line
3. Scrape them out minutely u...
Failure Beacons
1. 1x1 tracking pixel embedded in page
2. Request creates an access log line
3. Scrape them out minutely u...
Failure Beacons
1. 1x1 tracking pixel embedded in page
2. Request creates an access log line
3. Scrape them out minutely u...
Failure Beacons
1. 1x1 tracking pixel embedded in page
2. Request creates an access log line
3. Scrape them out minutely u...
Failure Beacons
▪ Client IP address can be geolocated

@lozzd • @ickymettle
Failure Beacons
▪ Optional extra debugging information
[31/Oct/2013:07:06:42 +0000] "GET /status/images/
beacon.gif?beacon...
Failure Beacons
▪ Optional extra debugging information

@lozzd • @ickymettle
Tracking Requests to Origin
GET / HTTP/1.1
User-Agent: curl/7.24.0
Accept: */*
X-Forwarded-Host: www.etsy.com
[...]
X-CDN-...
Tracking Requests to Origin
GET / HTTP/1.1
User-Agent: curl/7.24.0
Accept: */*
X-Forwarded-Host: www.etsy.com
[...]
X-CDN-...
Backend Monitoring
▪ Vendor APIs to bring data in house

@lozzd • @ickymettle
Backend Monitoring
▪ Logster on CDN provider header
▪ Vendor APIs to bring data in house

@lozzd • @ickymettle
Backend Monitoring
▪ Vendor APIs to bring data in house
▪ Data in-house benefits include
-

Integration with our anomaly d...
Awareness
▪ Over 100 engineers
▪ Deploying 60 times a day
▪ Correlating external and internal services

@lozzd • @ickymett...
Awareness

@lozzd • @ickymettle
Awareness
Deploy lines

@lozzd • @ickymettle
Frontend Monitoring
▪ Performance is important to us
▪ Monitoring overall site performance
▪ Monitoring performance by CDN...
Frontend Monitoring
▪ Performance is important to us
▪ Monitoring overall site performance
▪ Monitoring performance by CDN...
Downsides
http://www.flickr.com/photos/39272170@N00/3841286802
Debugging: What broke?
▪ MTTD/MTTR can be extremely low with this

system
▪ But not always

@lozzd • @ickymettle
Debugging: What broke?
▪ MTTD/MTTR can be extremely low with this

system
▪ But not always

@lozzd • @ickymettle
Debugging: What broke?
▪ Non technical member base
▪ Confusing and time consuming
▪ Amazing support team
▪ Log as much inf...
http://www.flickr.com/photos/sk8geek/4649776194

Conclusions/Takeaways
Great success
▪ 12 months in the benefits have far outweighed the

few downsides
▪ We’re continuing to evolve the system
▪...
Links/Open Source
▪ cdncontrol
http://github.com/etsy/cdncontrol
http://github.com/etsy/cdncontrol_ui

▪ logster
http://gi...
Thanks!
Questions?
@lozzd • @ickymettle
Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Upcoming SlideShare
Loading in...5
×

Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013

7,429

Published on

Relying on a single content delivery network for your site can impose a number of flexibility limitations. By diversifying your CDN providers you can put the power back in your hands, allowing you to get the best of both worlds in terms of performance, reliability and cost. In this talk Marcus and Laurie will present Etsy’s recent work integrating multiple CDN providers to their site delivery infrastructure.

This presentation was delivered at Velocity Europe, November 2013

Published in: Technology

Transcript of "Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013"

  1. 1. Integrating Multiple CDN Providers Our experiences at Etsy @lozzd • @ickymettle
  2. 2. Marcus Barczak Laurie Denness Staff Operations Engineers
  3. 3. @lozzd • @ickymettle
  4. 4. @lozzd • @ickymettle
  5. 5. Beginning of 2010 Today @lozzd • @ickymettle
  6. 6. Background ▪ First started using a single CDN in 2008 ▪ Exponential Growth ▪ Start of 2012 began investigation into running multiple CDNs @lozzd • @ickymettle
  7. 7. Why use a CDN? ▪ Goal: Consistently fast user experience globally ▪ Improve last mile performance by caching content close to the user ▪ Offload content delivery from origin infrastructure to the CDN provider @lozzd • @ickymettle
  8. 8. Why use more than one CDN? ▪ Resilience - Eliminate single point of failure ▪ Flexibility - Balance traffic based on business requirements ▪ Cost - Manage provider costs @lozzd • @ickymettle
  9. 9. The Plan http://www.flickr.com/photos/malloy/195204215
  10. 10. The Plan 1. Establish evaluation criteria 2. Initial configuration and testing 3. Test with production traffic 4. Operationalising @lozzd • @ickymettle
  11. 11. Evaluation Criteria @lozzd • @ickymettle http://www.flickr.com/photos/49212595@N00/5646403386
  12. 12. Evaluation Criteria ▪ Performance ▪ Configuration ▪ Reporting, Metrics and Logging ▪ Culture @lozzd • @ickymettle
  13. 13. Performance ▪ Baseline Response Times - Should be within ±5% of our existing CDN provider’s response times ▪ Hit Ratios and Origin Offload - Provider should achieve equivalent or better origin offload performance and hit ratios @lozzd • @ickymettle
  14. 14. Configuration ▪ Complexity - how complex is the providers configuration system ▪ Self service - can you make changes directly or do they require professional services or other intervention ▪ Latency for changes - how quickly do changes take to propagate @lozzd • @ickymettle
  15. 15. Reporting, Metrics and Logging ▪ Resolution ▪ Latency ▪ Delivery ▪ Customisation @lozzd • @ickymettle
  16. 16. Culture ▪ Understand our culture ▪ Postmortems ▪ Access to technical staff ▪ Shared success @lozzd • @ickymettle
  17. 17. Initial Configuration and Testing http://www.flickr.com/photos/7269902@N07/4592239326
  18. 18. Clean the house http://www.flickr.com/photos/mastergeorge/8562623590
  19. 19. Clean the house ▪ Managing caching TTLs from origin - CDNs honour the origin cache-control headers! <LocationMatch ".(gif|jpg|jpeg|png|css|js)$"> Header set Cache-Control "max-age=94670800" </LocationMatch> @lozzd • @ickymettle
  20. 20. Clean the house ▪ Manage gzip compression from origin - Honoured by CDNs - Compression from origin to CDN ## mod_deflate compression - see OPS-1537 ## AddOutputFilterByType DEFLATE text/html text/plain text/css application/x-javascript [..] @lozzd • @ickymettle
  21. 21. Clean the house If you can do it at origin, do it at origin @lozzd • @ickymettle
  22. 22. Mean Time To Curl http://www.flickr.com/photos/wwarby/3297205226
  23. 23. curl -i -H 'Host: img0.etsystatic.com' global-ssl.fastly.net/someimage.jpg HTTP/1.1 200 OK Server: Apache Last-Modified: Sat, 09 Nov 2013 23:43:38 GMT Cache-Control: max-age=94670800 [...] X-Served-By: cache-lo82-LHR X-Cache: MISS X-Cache-Hits: 0
  24. 24. curl -i -H 'Host: img0.etsystatic.com' global-ssl.fastly.net/someimage.jpg HTTP/1.1 200 OK Server: Apache Last-Modified: Sat, 09 Nov 2013 23:43:38 GMT Cache-Control: max-age=94670800 [...] X-Served-By: cache-lo82-LHR X-Cache: HIT X-Cache-Hits: 1
  25. 25. Mean Time To Curl = Done https://www.etsy.com/listing/99871278
  26. 26. Mean Time To Curl ▪ No need to touch existing infrastructure ▪ Smoke test of functionality ▪ 10 minutes from configuration to curl ▪ New providers should be plug and play @lozzd • @ickymettle
  27. 27. Testing In Production http://www.flickr.com/photos/solarnu/10646426865
  28. 28. Testing with Production Traffic ▪ Images only at first ▪ Good test of caching performance ▪ Easy to test by swapping hostnames ▪ Made even easier with our A/B testing framework @lozzd • @ickymettle
  29. 29. A/B Test Framework ▪ Fine grained control ▪ Enable test for specific users or groups ▪ Percentage of users ▪ All controlled via configuration in code ▪ Rapid and complete rollback @lozzd • @ickymettle
  30. 30. Configure Mappings to CDNs $server_config["image"] = array( 'akamai' => array( 'img0-ak.etsystatic.com', 'img1-ak.etsystatic.com', ), 'edgecast' => array( 'img0-ec.etsystatic.com', 'img1-ec.etsystatic.com', ), 'fastly' => array( 'img0-f.etsystatic.com', 'img1-f.etsystatic.com', ), ); @lozzd • @ickymettle
  31. 31. Test Controls $server_config['ab']['cdn'] = array( 'enabled' => 'on', 'weights' => array( 'akamai' => 0.0, 'edgecast' => 0.0, 'fastly' => 0.0, 'origin' => 100.0, ), 'override' => 'cdn_diversity', ); @lozzd • @ickymettle
  32. 32. Metrics and Monitoring @lozzd • @ickymettle http://www.flickr.com/photos/nicolasfleury/6073151084
  33. 33. Metrics and Monitoring Even if it doesn’t move, graph it anyway @lozzd • @ickymettle
  34. 34. Metrics and Monitoring Simplest approach: Provider’s dashboards @lozzd • @ickymettle
  35. 35. Metrics and Monitoring ▪ Get more detail by pulling metrics in house ▪ Write script to pull data from API ▪ Create dashboards with data @lozzd • @ickymettle
  36. 36. Metrics and Monitoring ▪ Get more detail by pulling metrics in house ▪ Write script to pull data from API ▪ Create dashboards with data @lozzd • @ickymettle
  37. 37. Metrics and Monitoring @lozzd • @ickymettle
  38. 38. Metrics and Monitoring @lozzd • @ickymettle
  39. 39. Testing Plan 1. for c in $cdns; do rampup $c; done; 2. Deliberately slow and steady 3. Watch traffic increase 4. Watch origin offload increase 5. Watch performance @lozzd • @ickymettle
  40. 40. Downsides of this approach ▪ AB testing can’t be used for main site ▪ Exposing your test CNAMEs ▪ Especially if hotlinking is a concern @lozzd • @ickymettle
  41. 41. Downsides of this approach ▪ Exposing your test CNAMEs ▪ Especially if hotlinking is a concern @lozzd • @ickymettle
  42. 42. How do you know it’s broke? ▪ Check the graphs! ▪ Check with your community ▪ Keep support in the loop @lozzd • @ickymettle
  43. 43. Operationalising http://www.flickr.com/photos/98047351@N05/9706165200
  44. 44. Content Partitioning @lozzd • @ickymettle
  45. 45. Etsy’s site partitioning Dynamic HTML Content www.etsy.com @lozzd • @ickymettle
  46. 46. Etsy’s site partitioning Static Assets (js, css, fonts) site.etsystatic.com @lozzd • @ickymettle
  47. 47. Etsy’s site partitioning Listing Images, Avatars imgX.etsystatic.com @lozzd • @ickymettle
  48. 48. Etsy’s site partitioning Dynamic HTML Content www.etsy.com Static Assets (js, css, fonts) site.etsystatic.com Listing Images, Avatars imgX.etsystatic.com @lozzd • @ickymettle
  49. 49. Balancing Traffic in Production http://www.flickr.com/photos/wok_design/2499217405
  50. 50. Balancing Traffic Using DNS ▪ Traffic Manager ▪ Extends DNS to dynamically return records based on rules ▪ Weighted round robin @lozzd • @ickymettle
  51. 51. Balancing Traffic Using DNS [2589:~] $ dig +short www.etsy.com www.etsy.com.edgekey.net. e2463.b.akamaiedge.net. 23.74.122.37 [2589:~] $ dig +short www.etsy.com [2589:~] $ dig +short www.etsy.com etsy.com. cs34.adn.edgecastcdn.net. 38.123.123.123 93.184.219.54 [2589:~] $ dig +short www.etsy.com global-ssl.fastly.net. 185.31.19.184 @lozzd • @ickymettle
  52. 52. Balancing Traffic Using DNS [2589:~] $ dig +short www.etsy.com etsy.com. [2589:~] $ dig +short www.etsy.com 38.123.123.123 www.etsy.com.edgekey.net. e2463.b.akamaiedge.net. 23.74.122.37 [2589:~] $ dig +short www.etsy.com cs34.adn.edgecastcdn.net. 93.184.219.54 [2589:~] $ dig +short www.etsy.com global-ssl.fastly.net. 185.31.19.184 @lozzd • @ickymettle
  53. 53. Balancing Traffic Using DNS ▪ Rule updates typically made via web UI ▪ Can be slow and error prone ▪ Changes need to be applied to all three domains ▪ API available to make changes programmatically @lozzd • @ickymettle
  54. 54. cdncontrol @lozzd • @ickymettle http://www.flickr.com/photos/foshydog/4441105829
  55. 55. cdncontrol @lozzd • @ickymettle
  56. 56. cdncontrol @lozzd • @ickymettle
  57. 57. cdncontrol @lozzd • @ickymettle
  58. 58. cdncontrol @lozzd • @ickymettle
  59. 59. cdncontrol @lozzd • @ickymettle
  60. 60. cdncontrol @lozzd • @ickymettle
  61. 61. cdncontrol @lozzd • @ickymettle
  62. 62. cdncontrol @lozzd • @ickymettle
  63. 63. cdncontrol @lozzd • @ickymettle
  64. 64. cdncontrol @lozzd • @ickymettle
  65. 65. DNS balancing downsides ▪ Low TTLs for fast convergence ▪ Mo QPS == Mo Money ▪ More DNS lookups for users ▪ Not 100% instant or deterministic @lozzd • @ickymettle
  66. 66. 50% within 1 minute Long Tail is Loooong @lozzd • @ickymettle
  67. 67. Monitoring in Production @lozzd • @ickymettle http://www.flickr.com/photos/9229426@N05/5160787240
  68. 68. Whoopsie Page ▪ Static HTML delivered for 5xx errors - Branding - Translated error messages - Links to status page @lozzd • @ickymettle
  69. 69. Whoopsie Page ▪ Static HTML delivered for 5xx errors - Branding - Translated error messages - Links to status page @lozzd • @ickymettle
  70. 70. Failure Beacons 1. 1x1 tracking pixel embedded in page [...] <img src="//failure.etsy.com/status/images/beacon.gif? beacon_source=fastly_origin_failure-etsy.com"> </body> </html> @lozzd • @ickymettle
  71. 71. Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line @lozzd • @ickymettle
  72. 72. Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster self.reg = re.compile('^S+(s:)? (?P<remote_addr>[0-9.]+),? [0-9.,- ]+ [[^]]+] "GET /status/images/beacon.gif? (beacon_)?source=(?P<source>S+) HTTP/1.d" d+ [d-]+ "(? P<referrer>[^"]+)" "(?P<user_agent>[^"]+)" .*$') @lozzd • @ickymettle
  73. 73. Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite @lozzd • @ickymettle
  74. 74. Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite @lozzd • @ickymettle
  75. 75. Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite 5. Alert on Graphite graph in Nagios @lozzd • @ickymettle
  76. 76. Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite 5. Alert on Graphite graph in Nagios @lozzd • @ickymettle
  77. 77. Failure Beacons ▪ Client IP address can be geolocated @lozzd • @ickymettle
  78. 78. Failure Beacons ▪ Optional extra debugging information [31/Oct/2013:07:06:42 +0000] "GET /status/images/ beacon.gif?beacon_source=fastly_origin_failure-etsy.com &provider_error=Connection%20timed%20out &server_identity=cache-ny57-NYC HTTP/1.1" @lozzd • @ickymettle
  79. 79. Failure Beacons ▪ Optional extra debugging information @lozzd • @ickymettle
  80. 80. Tracking Requests to Origin GET / HTTP/1.1 User-Agent: curl/7.24.0 Accept: */* X-Forwarded-Host: www.etsy.com [...] X-CDN-Provider: edgecast [...] Host: www.etsy.com @lozzd • @ickymettle
  81. 81. Tracking Requests to Origin GET / HTTP/1.1 User-Agent: curl/7.24.0 Accept: */* X-Forwarded-Host: www.etsy.com [...] X-CDN-Provider: edgecast [...] Host: www.etsy.com @lozzd • @ickymettle
  82. 82. Backend Monitoring ▪ Vendor APIs to bring data in house @lozzd • @ickymettle
  83. 83. Backend Monitoring ▪ Logster on CDN provider header ▪ Vendor APIs to bring data in house @lozzd • @ickymettle
  84. 84. Backend Monitoring ▪ Vendor APIs to bring data in house ▪ Data in-house benefits include - Integration with our anomaly detection systems - Consistent and unified view of all CDN metrics - We control data retention period @lozzd • @ickymettle
  85. 85. Awareness ▪ Over 100 engineers ▪ Deploying 60 times a day ▪ Correlating external and internal services @lozzd • @ickymettle
  86. 86. Awareness @lozzd • @ickymettle
  87. 87. Awareness Deploy lines @lozzd • @ickymettle
  88. 88. Frontend Monitoring ▪ Performance is important to us ▪ Monitoring overall site performance ▪ Monitoring performance by CDN provider ▪ Real User Monitoring on key pages to track page performance @lozzd • @ickymettle
  89. 89. Frontend Monitoring ▪ Performance is important to us ▪ Monitoring overall site performance ▪ Monitoring performance by CDN provider ▪ SOASTA mPulse on key pages to track real user page performance @lozzd • @ickymettle
  90. 90. Downsides http://www.flickr.com/photos/39272170@N00/3841286802
  91. 91. Debugging: What broke? ▪ MTTD/MTTR can be extremely low with this system ▪ But not always @lozzd • @ickymettle
  92. 92. Debugging: What broke? ▪ MTTD/MTTR can be extremely low with this system ▪ But not always @lozzd • @ickymettle
  93. 93. Debugging: What broke? ▪ Non technical member base ▪ Confusing and time consuming ▪ Amazing support team ▪ Log as much information as possible @lozzd • @ickymettle
  94. 94. http://www.flickr.com/photos/sk8geek/4649776194 Conclusions/Takeaways
  95. 95. Great success ▪ 12 months in the benefits have far outweighed the few downsides ▪ We’re continuing to evolve the system ▪ We’ll be sure to share our experience with the community along the way @lozzd • @ickymettle
  96. 96. Links/Open Source ▪ cdncontrol http://github.com/etsy/cdncontrol http://github.com/etsy/cdncontrol_ui ▪ logster http://github.com/etsy/logster ▪ CDN API to Graphite scripts http://github.com/lozzd/cdn_scripts @lozzd • @ickymettle
  97. 97. Thanks! Questions? @lozzd • @ickymettle
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×