Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013
Upcoming SlideShare
Loading in...5
×
 

Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013

on

  • 6,489 views

Relying on a single content delivery network for your site can impose a number of flexibility limitations. By diversifying your CDN providers you can put the power back in your hands, allowing you to ...

Relying on a single content delivery network for your site can impose a number of flexibility limitations. By diversifying your CDN providers you can put the power back in your hands, allowing you to get the best of both worlds in terms of performance, reliability and cost. In this talk Marcus and Laurie will present Etsy’s recent work integrating multiple CDN providers to their site delivery infrastructure.

This presentation was delivered at Velocity Europe, November 2013

Statistics

Views

Total Views
6,489
Views on SlideShare
6,370
Embed Views
119

Actions

Likes
19
Downloads
70
Comments
0

6 Embeds 119

https://twitter.com 112
https://mail.google.com 2
http://www.feweekly.com 2
http://moderation.local 1
http://www.linkedin.com 1
https://tweetdeck.twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013 Integrating multiple CDN providers at Etsy - Velocity Europe (London) 2013 Presentation Transcript

  • Integrating Multiple CDN Providers Our experiences at Etsy @lozzd • @ickymettle
  • Marcus Barczak Laurie Denness Staff Operations Engineers
  • @lozzd • @ickymettle
  • @lozzd • @ickymettle
  • Beginning of 2010 Today @lozzd • @ickymettle
  • Background ▪ First started using a single CDN in 2008 ▪ Exponential Growth ▪ Start of 2012 began investigation into running multiple CDNs @lozzd • @ickymettle
  • Why use a CDN? ▪ Goal: Consistently fast user experience globally ▪ Improve last mile performance by caching content close to the user ▪ Offload content delivery from origin infrastructure to the CDN provider @lozzd • @ickymettle
  • Why use more than one CDN? ▪ Resilience - Eliminate single point of failure ▪ Flexibility - Balance traffic based on business requirements ▪ Cost - Manage provider costs @lozzd • @ickymettle
  • The Plan http://www.flickr.com/photos/malloy/195204215
  • The Plan 1. Establish evaluation criteria 2. Initial configuration and testing 3. Test with production traffic 4. Operationalising @lozzd • @ickymettle
  • Evaluation Criteria @lozzd • @ickymettle http://www.flickr.com/photos/49212595@N00/5646403386
  • Evaluation Criteria ▪ Performance ▪ Configuration ▪ Reporting, Metrics and Logging ▪ Culture @lozzd • @ickymettle
  • Performance ▪ Baseline Response Times - Should be within ±5% of our existing CDN provider’s response times ▪ Hit Ratios and Origin Offload - Provider should achieve equivalent or better origin offload performance and hit ratios @lozzd • @ickymettle
  • Configuration ▪ Complexity - how complex is the providers configuration system ▪ Self service - can you make changes directly or do they require professional services or other intervention ▪ Latency for changes - how quickly do changes take to propagate @lozzd • @ickymettle
  • Reporting, Metrics and Logging ▪ Resolution ▪ Latency ▪ Delivery ▪ Customisation @lozzd • @ickymettle
  • Culture ▪ Understand our culture ▪ Postmortems ▪ Access to technical staff ▪ Shared success @lozzd • @ickymettle
  • Initial Configuration and Testing http://www.flickr.com/photos/7269902@N07/4592239326
  • Clean the house http://www.flickr.com/photos/mastergeorge/8562623590
  • Clean the house ▪ Managing caching TTLs from origin - CDNs honour the origin cache-control headers! <LocationMatch ".(gif|jpg|jpeg|png|css|js)$"> Header set Cache-Control "max-age=94670800" </LocationMatch> @lozzd • @ickymettle
  • Clean the house ▪ Manage gzip compression from origin - Honoured by CDNs - Compression from origin to CDN ## mod_deflate compression - see OPS-1537 ## AddOutputFilterByType DEFLATE text/html text/plain text/css application/x-javascript [..] @lozzd • @ickymettle
  • Clean the house If you can do it at origin, do it at origin @lozzd • @ickymettle
  • Mean Time To Curl http://www.flickr.com/photos/wwarby/3297205226
  • curl -i -H 'Host: img0.etsystatic.com' global-ssl.fastly.net/someimage.jpg HTTP/1.1 200 OK Server: Apache Last-Modified: Sat, 09 Nov 2013 23:43:38 GMT Cache-Control: max-age=94670800 [...] X-Served-By: cache-lo82-LHR X-Cache: MISS X-Cache-Hits: 0
  • curl -i -H 'Host: img0.etsystatic.com' global-ssl.fastly.net/someimage.jpg HTTP/1.1 200 OK Server: Apache Last-Modified: Sat, 09 Nov 2013 23:43:38 GMT Cache-Control: max-age=94670800 [...] X-Served-By: cache-lo82-LHR X-Cache: HIT X-Cache-Hits: 1
  • Mean Time To Curl = Done https://www.etsy.com/listing/99871278
  • Mean Time To Curl ▪ No need to touch existing infrastructure ▪ Smoke test of functionality ▪ 10 minutes from configuration to curl ▪ New providers should be plug and play @lozzd • @ickymettle
  • Testing In Production http://www.flickr.com/photos/solarnu/10646426865
  • Testing with Production Traffic ▪ Images only at first ▪ Good test of caching performance ▪ Easy to test by swapping hostnames ▪ Made even easier with our A/B testing framework @lozzd • @ickymettle
  • A/B Test Framework ▪ Fine grained control ▪ Enable test for specific users or groups ▪ Percentage of users ▪ All controlled via configuration in code ▪ Rapid and complete rollback @lozzd • @ickymettle
  • Configure Mappings to CDNs $server_config["image"] = array( 'akamai' => array( 'img0-ak.etsystatic.com', 'img1-ak.etsystatic.com', ), 'edgecast' => array( 'img0-ec.etsystatic.com', 'img1-ec.etsystatic.com', ), 'fastly' => array( 'img0-f.etsystatic.com', 'img1-f.etsystatic.com', ), ); @lozzd • @ickymettle
  • Test Controls $server_config['ab']['cdn'] = array( 'enabled' => 'on', 'weights' => array( 'akamai' => 0.0, 'edgecast' => 0.0, 'fastly' => 0.0, 'origin' => 100.0, ), 'override' => 'cdn_diversity', ); @lozzd • @ickymettle
  • Metrics and Monitoring @lozzd • @ickymettle http://www.flickr.com/photos/nicolasfleury/6073151084
  • Metrics and Monitoring Even if it doesn’t move, graph it anyway @lozzd • @ickymettle
  • Metrics and Monitoring Simplest approach: Provider’s dashboards @lozzd • @ickymettle
  • Metrics and Monitoring ▪ Get more detail by pulling metrics in house ▪ Write script to pull data from API ▪ Create dashboards with data @lozzd • @ickymettle
  • Metrics and Monitoring ▪ Get more detail by pulling metrics in house ▪ Write script to pull data from API ▪ Create dashboards with data @lozzd • @ickymettle
  • Metrics and Monitoring @lozzd • @ickymettle
  • Metrics and Monitoring @lozzd • @ickymettle
  • Testing Plan 1. for c in $cdns; do rampup $c; done; 2. Deliberately slow and steady 3. Watch traffic increase 4. Watch origin offload increase 5. Watch performance @lozzd • @ickymettle
  • Downsides of this approach ▪ AB testing can’t be used for main site ▪ Exposing your test CNAMEs ▪ Especially if hotlinking is a concern @lozzd • @ickymettle
  • Downsides of this approach ▪ Exposing your test CNAMEs ▪ Especially if hotlinking is a concern @lozzd • @ickymettle
  • How do you know it’s broke? ▪ Check the graphs! ▪ Check with your community ▪ Keep support in the loop @lozzd • @ickymettle
  • Operationalising http://www.flickr.com/photos/98047351@N05/9706165200
  • Content Partitioning @lozzd • @ickymettle
  • Etsy’s site partitioning Dynamic HTML Content www.etsy.com @lozzd • @ickymettle
  • Etsy’s site partitioning Static Assets (js, css, fonts) site.etsystatic.com @lozzd • @ickymettle
  • Etsy’s site partitioning Listing Images, Avatars imgX.etsystatic.com @lozzd • @ickymettle
  • Etsy’s site partitioning Dynamic HTML Content www.etsy.com Static Assets (js, css, fonts) site.etsystatic.com Listing Images, Avatars imgX.etsystatic.com @lozzd • @ickymettle
  • Balancing Traffic in Production http://www.flickr.com/photos/wok_design/2499217405
  • Balancing Traffic Using DNS ▪ Traffic Manager ▪ Extends DNS to dynamically return records based on rules ▪ Weighted round robin @lozzd • @ickymettle
  • Balancing Traffic Using DNS [2589:~] $ dig +short www.etsy.com www.etsy.com.edgekey.net. e2463.b.akamaiedge.net. 23.74.122.37 [2589:~] $ dig +short www.etsy.com [2589:~] $ dig +short www.etsy.com etsy.com. cs34.adn.edgecastcdn.net. 38.123.123.123 93.184.219.54 [2589:~] $ dig +short www.etsy.com global-ssl.fastly.net. 185.31.19.184 @lozzd • @ickymettle
  • Balancing Traffic Using DNS [2589:~] $ dig +short www.etsy.com etsy.com. [2589:~] $ dig +short www.etsy.com 38.123.123.123 www.etsy.com.edgekey.net. e2463.b.akamaiedge.net. 23.74.122.37 [2589:~] $ dig +short www.etsy.com cs34.adn.edgecastcdn.net. 93.184.219.54 [2589:~] $ dig +short www.etsy.com global-ssl.fastly.net. 185.31.19.184 @lozzd • @ickymettle
  • Balancing Traffic Using DNS ▪ Rule updates typically made via web UI ▪ Can be slow and error prone ▪ Changes need to be applied to all three domains ▪ API available to make changes programmatically @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle http://www.flickr.com/photos/foshydog/4441105829
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • cdncontrol @lozzd • @ickymettle
  • DNS balancing downsides ▪ Low TTLs for fast convergence ▪ Mo QPS == Mo Money ▪ More DNS lookups for users ▪ Not 100% instant or deterministic @lozzd • @ickymettle
  • 50% within 1 minute Long Tail is Loooong @lozzd • @ickymettle
  • Monitoring in Production @lozzd • @ickymettle http://www.flickr.com/photos/9229426@N05/5160787240
  • Whoopsie Page ▪ Static HTML delivered for 5xx errors - Branding - Translated error messages - Links to status page @lozzd • @ickymettle
  • Whoopsie Page ▪ Static HTML delivered for 5xx errors - Branding - Translated error messages - Links to status page @lozzd • @ickymettle
  • Failure Beacons 1. 1x1 tracking pixel embedded in page [...] <img src="//failure.etsy.com/status/images/beacon.gif? beacon_source=fastly_origin_failure-etsy.com"> </body> </html> @lozzd • @ickymettle
  • Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line @lozzd • @ickymettle
  • Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster self.reg = re.compile('^S+(s:)? (?P<remote_addr>[0-9.]+),? [0-9.,- ]+ [[^]]+] "GET /status/images/beacon.gif? (beacon_)?source=(?P<source>S+) HTTP/1.d" d+ [d-]+ "(? P<referrer>[^"]+)" "(?P<user_agent>[^"]+)" .*$') @lozzd • @ickymettle
  • Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite @lozzd • @ickymettle
  • Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite @lozzd • @ickymettle
  • Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite 5. Alert on Graphite graph in Nagios @lozzd • @ickymettle
  • Failure Beacons 1. 1x1 tracking pixel embedded in page 2. Request creates an access log line 3. Scrape them out minutely using logster 4. Logster posts event counts to Graphite 5. Alert on Graphite graph in Nagios @lozzd • @ickymettle
  • Failure Beacons ▪ Client IP address can be geolocated @lozzd • @ickymettle
  • Failure Beacons ▪ Optional extra debugging information [31/Oct/2013:07:06:42 +0000] "GET /status/images/ beacon.gif?beacon_source=fastly_origin_failure-etsy.com &provider_error=Connection%20timed%20out &server_identity=cache-ny57-NYC HTTP/1.1" @lozzd • @ickymettle
  • Failure Beacons ▪ Optional extra debugging information @lozzd • @ickymettle
  • Tracking Requests to Origin GET / HTTP/1.1 User-Agent: curl/7.24.0 Accept: */* X-Forwarded-Host: www.etsy.com [...] X-CDN-Provider: edgecast [...] Host: www.etsy.com @lozzd • @ickymettle
  • Tracking Requests to Origin GET / HTTP/1.1 User-Agent: curl/7.24.0 Accept: */* X-Forwarded-Host: www.etsy.com [...] X-CDN-Provider: edgecast [...] Host: www.etsy.com @lozzd • @ickymettle
  • Backend Monitoring ▪ Vendor APIs to bring data in house @lozzd • @ickymettle
  • Backend Monitoring ▪ Logster on CDN provider header ▪ Vendor APIs to bring data in house @lozzd • @ickymettle
  • Backend Monitoring ▪ Vendor APIs to bring data in house ▪ Data in-house benefits include - Integration with our anomaly detection systems - Consistent and unified view of all CDN metrics - We control data retention period @lozzd • @ickymettle
  • Awareness ▪ Over 100 engineers ▪ Deploying 60 times a day ▪ Correlating external and internal services @lozzd • @ickymettle
  • Awareness @lozzd • @ickymettle
  • Awareness Deploy lines @lozzd • @ickymettle
  • Frontend Monitoring ▪ Performance is important to us ▪ Monitoring overall site performance ▪ Monitoring performance by CDN provider ▪ Real User Monitoring on key pages to track page performance @lozzd • @ickymettle
  • Frontend Monitoring ▪ Performance is important to us ▪ Monitoring overall site performance ▪ Monitoring performance by CDN provider ▪ SOASTA mPulse on key pages to track real user page performance @lozzd • @ickymettle
  • Downsides http://www.flickr.com/photos/39272170@N00/3841286802
  • Debugging: What broke? ▪ MTTD/MTTR can be extremely low with this system ▪ But not always @lozzd • @ickymettle
  • Debugging: What broke? ▪ MTTD/MTTR can be extremely low with this system ▪ But not always @lozzd • @ickymettle
  • Debugging: What broke? ▪ Non technical member base ▪ Confusing and time consuming ▪ Amazing support team ▪ Log as much information as possible @lozzd • @ickymettle
  • http://www.flickr.com/photos/sk8geek/4649776194 Conclusions/Takeaways
  • Great success ▪ 12 months in the benefits have far outweighed the few downsides ▪ We’re continuing to evolve the system ▪ We’ll be sure to share our experience with the community along the way @lozzd • @ickymettle
  • Links/Open Source ▪ cdncontrol http://github.com/etsy/cdncontrol http://github.com/etsy/cdncontrol_ui ▪ logster http://github.com/etsy/logster ▪ CDN API to Graphite scripts http://github.com/lozzd/cdn_scripts @lozzd • @ickymettle
  • Thanks! Questions? @lozzd • @ickymettle