Successfully reported this slideshow.

API Caching, why your server needs some rest

30

Share

Upcoming SlideShare
Http2 in practice
Http2 in practice
Loading in …3
×
1 of 44
1 of 44

API Caching, why your server needs some rest

30

Share

Download to read offline

The best HTTP request made to your server is that one that never reaches it. Do you know the life cycle time of your resources? How to be sure that the user never reaches an expired response without the need to open the connection door with the origin server? What kinds of caches do exist and when do I need to use each one of them? Why can I not be afraid to read the RFCs? This talk will present good practices on the usage of HTTP cache for APIs and web applications, turning your digital products to optimize the usage of machines and save money.

The best HTTP request made to your server is that one that never reaches it. Do you know the life cycle time of your resources? How to be sure that the user never reaches an expired response without the need to open the connection door with the origin server? What kinds of caches do exist and when do I need to use each one of them? Why can I not be afraid to read the RFCs? This talk will present good practices on the usage of HTTP cache for APIs and web applications, turning your digital products to optimize the usage of machines and save money.

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

API Caching, why your server needs some rest

  1. 1. @TwitterAds | Confidential @lfcipriani 2013-08-30 APIs Caching W h y y o u r s e r v e r n e e d s s o m e r e s t R u b y c o n f B r a z i l 2 0 1 3
  2. 2. @TwitterAds | Confidential Who? @lfcipriani
  3. 3. @TwitterAds | Confidential What?
  4. 4. @lfcipriani Scope of this presentation 4 • Caching in a Distributed System • The flows of HTTP Cache and how to control them • Good and Bad Practices
  5. 5. @lfcipriani If you need a friendly way to understand the Caching part of RFC 2616 Scope of this presentation 5 Source: http://www.slideshare.net/lfcipriani/fearless-http-requests-abuse
  6. 6. @TwitterAds | Confidential Definitions and Definitions and Motivations 6
  7. 7. @lfcipriani Memorizing phone numbers or go check phonebook every time 7 Analogy
  8. 8. @lfcipriani Network Effect 8 Welcome to the first year of Software Engineering... ...where every request delivers a response without failure and all network is reliable and fast. Source: First day on Internet Kid (know your meme)
  9. 9. @lfcipriani What problems cache helps to solve? • redundant and unnecessary data traffic • network bottlenecks • origin server heavy load (or spikes) • long network latency 9
  10. 10. @lfcipriani HTTP Archive 10 Motivations Source: http://httparchive.org/trends.php?s=All&minlabel=Jan+20+2011&maxlabel=Aug+15+2013 All sites Top 1000
  11. 11. @lfcipriani HTTP Archive Cache lifetime: All Sites vs Top 100 11 Motivations http://httparchive.org/interesting.php?a=All&l=Aug%2015%202013&s=Top100
  12. 12. @TwitterAds | Confidential HTTP Caching Protocol 12
  13. 13. @lfcipriani HTTP Caching flows 13
  14. 14. @lfcipriani 14 https://vine.co/v/hOuAXTOetuz bit.ly/vinecaching
  15. 15. @lfcipriani 15 https://vine.co/v/hOuMHbTzp6h bit.ly/vinecaching
  16. 16. @lfcipriani 16https://vine.co/v/hOu5g9FVDa5 bit.ly/vinecaching
  17. 17. @lfcipriani 17 https://vine.co/v/hOuvzinwrt6 bit.ly/vinecaching
  18. 18. @lfcipriani The Cache headers zoo 18 Source: http://www.slideshare.net/lfcipriani/fearless-http-requests-abuse
  19. 19. @TwitterAds | Confidential Cache Coherency 19
  20. 20. @lfcipriani What’s cache coherency? 20 Since only the Origin Server knows the state of a resource with certainty, caches and other components must to ensure that the cached response is still fresh before returning it to client. Due to the complexity, keep cache coherency in distributed systems has a high cost. In a distributed system
  21. 21. @lfcipriani Better safe than sorry Strong consistency 21 Maintain coherency by revalidating every request in origin server.
  22. 22. @lfcipriani Living dangerously Weak consistency 22 Cache has autonomy to use a heuristic to decide whether the cached response is still fresh, without consulting the origin server Basically, there are 2 types of weak consistency.
  23. 23. @lfcipriani Weak consistency - Invalidation 23
  24. 24. @lfcipriani Weak consistency - Invalidation is bad! 24 • approach does not scale • server needs to coordinate with a unknown network of caches • choose 2: immediacy, scalability, reliability • “There are only two hard things in Computer Science: cache invalidation and naming things” - Phil Karlton • Two Generals Problem http://www.subbu.org/blog/2010/01/cache-invalidation http://en.wikipedia.org/wiki/Two_Generals'_Problem
  25. 25. @lfcipriani Weak consistency - When to do Invalidation 25 When your network is similar to the one below ;-)
  26. 26. @lfcipriani Weak consistency - TTL approach 26
  27. 27. @TwitterAds | Confidential Taming Cache 27
  28. 28. @lfcipriani Topology considerations 28
  29. 29. @lfcipriani Controlling cacheability Protocol Specific Considerations 29 1. locally means a cache that servers only one consumer 2. these directives override any configuration of the cache 3. by default, we can cache non safe/authenticated requests, GET and HEAD and those with status code 200, 203, 206, 300, 301, 410 cache-control directive may I cache locally? may I cache anywhere? should revalidate, even being fresh? no-store no no n/a private yes no no no-cache yes yes yes public yes yes no
  30. 30. @lfcipriani 30 Protocol Specific Considerations Controlling cacheability Be aware of the Vary header, if the value is a header name which values are high diversified, you could fill cache storage too fast.
  31. 31. @lfcipriani 31 Protocol Specific Considerations Controlling revalidation Revalidation is done with conditional requests. If-Modified-Since != Last-Modified = 200 If-Modified-Since == Last-Modified = 304 If-None-Match != Etag = 200 If-None-Match == Etag = 304 You can even decide how revalidation is done.
  32. 32. @lfcipriani Content specific considerations 32 Careful with cookies Be aware of how privacy policy influences what’s cacheable
  33. 33. @lfcipriani Content life cycle considerations 33 TL;DR; Know the rates of change of your resources and establish a time to live for them. Expires=[Date] Cache-Control: max-age=[seconds]
  34. 34. @lfcipriani 34 • too short (seconds) or too long (days) TTLs smell bad • TTL can vary, don’t consider it as a constant value. • don’t be afraid to get sophisticated, if needed: • L-Factor heuristic: (date - last modified) * factor • Prediction Models http://www.slideshare.net/jseidman/real-world-machine-learning-at-orbitz-strata-2011 • Control your cache strategy! Content life cycle considerations
  35. 35. @lfcipriani General considerations 35 Deciding to have NO cache is part of the strategy. Your cache strategy might not be honored by an intermediary cache, no hard feelings about it, is more common than you think.
  36. 36. @TwitterAds | Confidential Measuring efficiency 36
  37. 37. @lfcipriani Measuring Cache efficiency 37 Hit Rate = Cache hits / Total of requests This will depend on: • how big your cache is • how similar the interests of the cache users are • the data rate of change • how caches are configured
  38. 38. @lfcipriani Measuring Cache efficiency 38 Byte Hit Rate = Bytes transferred from cache hits / Bytes transferred by Total of requests
  39. 39. @lfcipriani Measuring Cache efficiency 39 • the same metrics could be applied to revalidations • do the measures by resource • do continuous measures and monitor to improve strategy
  40. 40. @lfcipriani Validate your strategy in redbot.org 40 Measuring Cache efficiency
  41. 41. @TwitterAds | Confidential Final considerations 41
  42. 42. @lfcipriani Final considerations 42 • Is important to have a good knowledge of Topology of the application and Distributed Systems constraints. • Think and build a good strategy, don’t rely on default heuristics • Measure, monitor and improve. Strategies are dynamic and change it is part of the process. • All this can be done incrementally, focus on relevant resources • Be careful to not turn cache into overhead.
  43. 43. @lfcipriani 43 References Web Protocols and Practice: HTTP/1.1, Networking Protocols, Caching, and Traffic Measurement (Balachander Krishnamurthy and Jennifer Rexford) HTTP: The Definitive Guide (David Gourley, Brian Totty, Marjorie Sayer and Anshu Aggarwal) http://www.w3.org/Protocols/rfc2616/rfc2616.html (HTTP RFC) http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13 (Caching in HTTP) http://stevesouders.com/ http://talleye.com/ https://dev.twitter.com/ bit.ly/vinecaching
  44. 44. @TwitterAds | Confidential Thank you!

×