MED301 Is My CDN Performing? - AWS re: Invent 2012

1,603 views

Published on

This presentation provides practical guidance using external agent-based measurements and real user monitoring techniques. We review common content delivery network (CDN) architectures and how they relate to performance measurement. Finally, we walk through real-world CDN performance monitoring implementations used by MapBox, Amazon.com, and Amazon CloudFront.

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,603
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

MED301 Is My CDN Performing? - AWS re: Invent 2012

  1. 1. #reinvent
  2. 2. Why Measure?“For e-commerce sites, a onesecond delay in web pageresponse time decreasedconversions by 7%” Source: Aberdeen Group Source: Compuware Gomez
  3. 3. What are they downloading?Where are they?How does that influence mymeasurement?
  4. 4. Identify Customer Vital Signs Customer Experience Metrics Diagnostic Metrics
  5. 5. Identify Customer Experience Metrics Availability Download Latency
  6. 6. Availability # of concurrent requestsCustomers want toreach your content
  7. 7. Download Latency Avg. US Latency 1.8 KB ObjectCustomers want fast 420 410 400downloads 390 380 370 360 350 340 330 320
  8. 8. Service-Side External Real User
  9. 9. Customer Ease of Diagnostic Experience Setup DataService-Side ? ? ?External ? ? ?Real User ? ? ?
  10. 10. Service-Side External Real User
  11. 11. Service-Side Measurement• First-byte latency• Server utilization• Edge-cache hits• W3C access logs
  12. 12. Service-Side Measurement Customer Ease of Diagnostic Experience Setup DataService-Side D B AExternal ? ? ?Real User ? ? ?
  13. 13. Service-Side External Real User
  14. 14. External Measurement GET /test_object.png Amazon CloudFront
  15. 15. External Measurement
  16. 16. Path of a typical web request Amazon CloudFrontWeb Client DNS Server CDN Cache Web Content
  17. 17. First Request GET /cat_video.mp4Miss: 100msLookup video.amazon.com Miss: 1sec Amazon CloudFrontWeb Client DNS Server CDN Cache Web Content
  18. 18. Second Request: Cache is Primed GET /cat_video.mp4 Amazon CloudFrontWeb Client DNS Server CDN Cache Web Content
  19. 19. Amazon CloudFrontWeb Client DNS Server CDN Cache Web Content
  20. 20. Amazon CloudFrontWeb Client DNS Server Network CDN Cache Web Content
  21. 21. 23 ms 300 ms 42 ms
  22. 22. Object Popularity Viewer NetworkLocation Performance Viewer Experience
  23. 23. Internet Backbone2 ms Regional ISP Amazon CloudFront 40 ms Metro ISP
  24. 24. External Measurement Customer Ease of Diagnostic Experience Setup DataService-Side D B AExternal C B BReal User ? ? ?
  25. 25. Service-Side External Real User
  26. 26. Real-User Measurement Download Time
  27. 27. Real User Measurement Telemetry DataWeb Browser Monitoring Aggregator
  28. 28. var startTime = new Date().getTime(); var testObjectUrl = ”/test_object.png”; //Measure round trip to the edge<script /> $.ajax({ url: testObjectUrl, success: function(data) { var endTime = new Date().getTime(); publishMeasurement(endTime - startTime); } });
  29. 29. var startTime = new Date().getTime(); var testObjectUrl = ”/test_object.png”;Request Test Object //Measure round trip to the edge $.ajax({ url: testObjectUrl, success: function(data) { var endTime = new Date().getTime(); publishMeasurement(endTime - startTime); } });
  30. 30. var startTime = new Date().getTime(); var testObjectUrl = ”/test_object.png”; Wrapped in JavaScript //Measure round trip to the edge Timers* Timers $.ajax({ url: testObjectUrl, success: function(data) { var endTime = new Date().getTime(); publishMeasurement(endTime - startTime); } });*More Info on JavaScript timers: http://ejohn.org/blog/accuracy-of-javascript-time/
  31. 31. Real User Measurement Customer Ease of Diagnostic Experience Setup DataService-Side D B AExternal C B BReal User A C C+
  32. 32. Real User Measurement 2.0:
  33. 33. Real User Measurement 1.0 Customer Ease of Diagnostic Experience Setup DataService-Side D B AExternal C B BReal User A C C+
  34. 34. Real User Measurement 2.0 Customer Ease of Diagnostic Experience Setup DataService-Side D B AExternal C B BReal User A C A
  35. 35. Measurement Summary Customer Ease of Diagnostic Experience Setup DataService-Side D B AExternal C B BReal User A C AGet your customer experience metrics here
  36. 36. Cache BustingObject Size
  37. 37. • Edge selection: connect time• Cache server performance: FBL of hits• Network performance: LBL of hits >100 KB• Origin fetch: FBL of misses
  38. 38. • Edge selection: connect time• Cache server performance: FBL of hits• Network performance: LBL of hits >100 KB• Origin fetch: FBL of misses
  39. 39. • Edge selection: connect time• Cache server performance: FBL of hits• Network performance: LBL of hits >100 KB• Origin fetch: FBL of misses
  40. 40. • Edge selection: connect time• Cache server performance: FBL of hits• Network performance: LBL of hits >100 KB• Origin fetch: FBL of misses
  41. 41. ~1 KB 100 KB+2 to 3 Round Trips 7 to 8 Round Trips 0-1 DNS Lookup 0-1 DNS Lookup 1 TCP Connect 1 TCP Connect 1 Object Download 6 Object DownloadTests Client to Edge Latency Tests Client to Edge Latency & Network Quality
  42. 42. DNS Lookup TCP Connect Edge Cache Client CacheViewer-Driven Bust Prime Bust DNS rotation example: Two-requests example: Query string example: test1.example.com GET /test_100k.png GET /test_100k.png?ver=1 test2.example.com GET /test_100k.png GET /test_100k.png?ver=2 Prime Viewer-Driven Two-requests example: One-request example: test.example.com GET /test_100k.png test.example.com Bust One-request w/ CDN Config example: GET /test_100k.png?ver=1
  43. 43. http://rum1.internetkitties.com/test_100k.png?cache_bust=16349583&rid=345-04324-034533 1. Don’t Prime Edge Cache
  44. 44. http://rum1.internetkitties.com/test_100k.png?cache_bust=16349583&rid=345-04324-034533 1. Don’t Prime Edge Cache 2. Fresh TCP Connection
  45. 45. http://rum1.internetkitties.com/test_100k.png?cache_bust=16349583&rid=345-04324-034533 1. Don’t Prime Edge Cache 2. Fresh TCP Connection 3. 100KB Object
  46. 46. http://rum1.internetkitties.com/test_100k.png?cache_bust=16349583&rid=345-04324-034533 1. Don’t Prime Edge Cache 2. Fresh TCP Connection 3. 100KB Object 4. Bust Client Cache
  47. 47. http://rum1.internetkitties.com/test_100k.png?cache_bust=16349583&rid=345-04324-034533 1. Don’t Prime Edge Cache 2. Fresh TCP Connection 3. 100KB Object 4. Bust Client Cache 5. Diagnostics via Nav Timing API
  48. 48. http://rum1.internetkitties.com/test_100k.png?cache_bust=16349583&rid=345-04324-034533 1. Don’t Prime Edge Cache 2. Fresh TCP Connection 3. 100KB Object 4. Bust Client Cache 5. Diagnostics via Nav Timing API 6. **Use a Request ID**
  49. 49. VisibleActionable
  50. 50. Geographic Regions Clients/Devices CDNs Viewer Networks
  51. 51. Geographic Regions Clients/Devices CDNs Viewer Networks
  52. 52. # of requests - GLOBALtime Regional Partition Uncovers Hidden Insights # of requests – PER REGION Europe US Asia time
  53. 53. Geographic Regions Clients/Devices CDNs Viewer Networks
  54. 54. Geographic Regions Amazon CloudFront Clients/Devices CDN A CDNs CDN B Viewer Networks
  55. 55. Geographic Regions Clients/Devices CDNs Viewer Networks
  56. 56. Segmentation Schemes Geographic Regions Clients/Devices CDNs Viewer Networks Image source: The Opte Project
  57. 57. Viewer Network Segmentation 23 ms 27 ms 42 ms
  58. 58. Viewer Network Segmentation 23 ms 300 ms 42 ms
  59. 59. Metric Sorted Lists Viewer Networks Object
  60. 60. Metric Sorted Lists Viewer Networks Object Image source: The Opte Project
  61. 61. Isolate Outliers With Sorted Lists Network ID Last Byte Latency Viewer Networks Object
  62. 62. Isolate Outliers With Sorted Lists Viewer Networks Object
  63. 63. MapBox: Design maps in the cloud, publish inminutes
  64. 64. 256x256 (png/jpg)
  65. 65. Maps as HTTPContent-Type: image/jpgContent-Length: 33049Connection: keep-aliveCache-Control: max-age=7200Date: Fri, 19 Oct 2012 13:43:11 GMTETag: "6ff3ca335cb686dae60905b20...”Last-Modified: Tue, 18 Sep 2012 02:20:00GMTAge: 4X-Cache: Hit from cloudfront
  66. 66. “Scale” You Say?• A popular map can do 50+ million tiles a day• 50e6 / 864e2 = ~580 req/sec• I have tens of thousands of maps
  67. 67. Dimensions of Scale
  68. 68. Dimensions of Scale
  69. 69. Maps as Utility• Foursquare• Weather (WDT, USA Today)• Airplane pilot maps (foreflight)
  70. 70. Maps as Utility• Our customers maps are critical elements of their service• No maps isn’t an option for them
  71. 71. On AWS from the Start• Benchmarking the whole time• For example: `c1.xlarge` is our sweet spot• We tried everything…• …with every configuration we could think of
  72. 72. On AWS from the Start• Use Amazon Cloudfront as our CDN• Got business pressure to switch...• So, we benchmarked (Jan/Feb 2012)
  73. 73. CDN EvaluationWhat do we care about?
  74. 74. CDN Evaluation• Speed• Per-GB cost, setup fees & minimums• Log delivery• Cache purging• Support quality
  75. 75. CDN Evaluation• Akamai• Amazon CloudFront• EdgeCast• Limelight• HighWinds
  76. 76. CDN Benchmarking• Objects are almost exclusively images• Very difficult to measure existing users
  77. 77. CDN Benchmarking• External monitoring• Page loads & js evaluation• 64 monitoring stations• Global distribution
  78. 78. CDN Benchmarking• Two identical stacks• Simultaneously• Same map, same max-age, etc….• 24 hours
  79. 79. CDN Benchmarking• Spot tested with real people across the US, Europe, and South America
  80. 80. CDN Benchmarking
  81. 81. CDN Evaluation• Speed: Faster• Cost: Lower• Logs: Not FTP, timely delivery• Cache purges: check• Support: check
  82. 82. Ok, CDN is fast, great. All done. Right?• No• Do less• Get closer
  83. 83. Do Less
  84. 84. Do Less• If-Modified-Since• CDN has an expired item, it checks back with the origin• Origin says "oh, you can keep using that“ – Just headers
  85. 85. Do Less• Ability to independently quickly query metadata for resource components• Know validity before loading full resource• Sometimes this is dismissed as impractical, more often worth the work
  86. 86. Do Less
  87. 87. Do Less
  88. 88. Do Less
  89. 89. Get Closer http://www.flickr.com/photos/lightsniper
  90. 90. Get Closer• Even electrons take time to cross an ocean• CDN doesnt have to go as far on misses
  91. 91. Get Closer• Two independent clusters: Ireland & Virginia• Master: Master replication• DNS: any-cast / latency-based / geo-based routing
  92. 92. Get Closer• AWS CloudFormation• Config management (Puppet)• Automated deploys
  93. 93. Traffic: US vs. EU
  94. 94. Get Closer
  95. 95. Did That Help?• IMS - 10x• Proximity - 6x
  96. 96. tl;dr• Know what matters to your service• Measure all the things• Be a good upstream to your CDN
  97. 97. http://www.stevesouders.com/blog/2012/11/14/comparing-rum-synthetic-page-load-times/
  98. 98. www.aws.amazon.com/cloudfront www.aws.amazon.com/cloudfront www.aws.amazon.com/cloudfrontwww.aws.amazon.com/cloudfront/dynamic-content
  99. 99. #reinvent We are sincerely eager to hear your feedback on thispresentation and on re:Invent. Please fill out an evaluation form when you have a chance.

×