Caching in HTTP


Published on

How and what to cache

Caching in HTTP

  1. 1. HTTP CachingAlexander Shopov
  2. 2. Alexander ShopovBy day: Software Engineer at CiscoBy night: OSS contributorCoordinator of Bulgarian Gnome TP Contacts:E-mail: ash@kambanaria.orgJabber: al_shopov@jabber.minus273.orgLinkedIn: Just search “al_shopov”
  3. 3. Please Learn And Share License: CC-BY v3.0Creative Commons Attribution v3.0
  4. 4. DisclaimerMy opinions, knowledge and experience! Not my employers.
  5. 5. Why Cache At All?● Lowers number of requests, improves latency, provides scaling● AJAX caching leads to lively applications● Lowers server load for all kinds of content, but especially important (and hard) for dynamic content
  6. 6. MOST IMPORTANT RESOURCE!● RFC 2616● HTTP caching: –
  7. 7. Purpose of caching● Eliminate the need for requests – No server round trip at all – fastest way – Expiration – received data is fine● Eliminate the need for full answers – Lower traffic, narrow bandwidth – Validation – received data probably fine, check it
  8. 8. HTTP participants● All of them in the protocol from day 1 – not an afterthought! – Origin server – Gateway/reverse proxy (shared cache) – Proxy (shared cache) – Client (can have internal cache – non shared cache)● Gateway is similar to Proxy – Proxies – chosen by client (or clients) – Gateways – chosen by server
  9. 9. Client ↔ Intermediaries ↔ Server● Easy/safe upgrade of protocol during conversation● Caching principles: – Semantically transparent – Explicit permits for non transparent actions – Intermediaries can add warnings – Caching headers/directives can be one way● Different behaviour for requests: – Safe requests: GET/HEAD. Breaking this is servers fault, not clients. All other requests must reach origin server – Idempotent requests – repeating ≡ doing them once: GET/HEAD + PUT/DELETE/OPTIONS/TRACE
  10. 10. HTML: Meta tags● Widely used and as widely ineffective: – The only thing HTML designers can put – Not read/used by intermediaries – Not all browser caches honour it● Do not rely on them! No real reason to use them. (actually the real reason is that habit is second nature).
  11. 11. HTTP 1.0● Pragma: no-cache – Pragmas are problematic – not all participants honour them.● Proper equivalent in HTTP 1.1: – Cache-Control: no-cache – Take from server even if available from cache
  12. 12. HTTP 1.1● Expires – until then have it fresh● ETag – (do) you have this version● Cache-Control – fine grained tuning
  13. 13. Expires● Expires: absolute_date● To mark a resource already expired include header: Expires = Date
  14. 14. ETag – 1● No ordering, just value – either matches (single value or a value from set) or does not.● Per URI – no sense in comparing tags from different URIs, E = entity● ETag: resource tag – ETag: "xyzzy" – strong, bit by bit equivalence – ETag: W/"xyzzy" – weak, semantic equivalence● Different matches – Strong – matches and all tags are strong. – Weak – matches, possible for tag to be weak.
  15. 15. ETag – 2● Conditional requests: if matching – just confirmation, otherwise – data itself – If-Modified-Since – If-Unmodified-Since – If-Match – If-None-Match – If-Range● Strong tags allow for caching of partial answers
  16. 16. Cache-Control● All HTTP 1.1 participants MUST obey it (otherwise they are broken.● MUST reach all participants● Cannot target a particular intermediary
  17. 17. Cache-Control● cache-request-directive ● cache-response-directive – no-cache – public – no-store – private – max-age – no-cache – max-stale – no-store – min-fresh – no-transform – no-transform – must-revalidate – only-if-cached – proxy-revalidate – cache-extension – max-age – s-maxage – cache-extension
  18. 18. Cache-Control Categories● What is cacheable – only imposed by server● What can be stored in cache – imposed by server and client● Modifications on expiration – imposed by server and client● Control over cache revalidation and reload – only imposed by client● Control over transformation of entities● Extensions to the caching system
  19. 19. Cache-Control – Requests 1● no-cache – cache should revalidate with server● no-store – do not store on durable media● max-age[=sec] – clients wants info no older than this● max-stale[=sec] – client accepts stale information but no more stale than this
  20. 20. Cache-Control – Requests 2● min-fresh[=sec] – clients wants info that will stay fresh for this time● no-transform – no trasnform by intermediary – Medical Xray Photo from PNG to JPEG● only-if-cached – when connection to server is bad. Better to get 504 (Gateway Timeout) than wait● cache-extension – extensions
  21. 21. Cache-Control – Responses 1● public – may be cached by any cache● private – must not be cached by shared cache● no-cache – cache should revalidate with server● no-store – do not store on durable media● no-transform – no trasnform by intermediary● must-revalidate – server requested revalidation of stale data● proxy-revalidate – same as above but not for user agent cache
  22. 22. Cache-Control – Responses 2● max-age[=sec] – for any cache● s-maxage[=sec] – for shared cache, priority over max-age and Expires.● cache-extension – extensions
  23. 23. Status Codes 1● 201 Created – can contain ETag, resource created – (contrast with 202)● 203 Non-Authoritative Information – not from originating server but from cache● 206 Partial Content – range partial GET request – (contains ETag, Expires, Cache-Control, Vary if changeable). Result to If-Range. If either ETag or Last-Modified dont match – cache does not combine them with others. If no support from ranges in cache – 206 not cached.
  24. 24. Status Codes 2● 302 Found – redirect that can change. Use Cache- Control or Expires● 304 Not Modified – conditional GET, resource not changed, body of response empty (ETag/Content- Location, Expires, Cache-Control or Vary)● 305 Use Proxy – per request, generated by server● 307 Temporary Redirect – similar to 302
  25. 25. Conditional requests/responses● Origin servers – Should provide both ETag (preferably strong unless not feasible) and Last-Modified – Must avoid reusing specific strong ETag for different entities● Clients – Must/should use ETag Last-Modified and them in conditional requests
  26. 26. AJAX● Use cache directives in AJAX● Try to make your AJAX responses cacheable (you will have to think!)● POSTs are mostly uncacheable, prefer GETs to fetch information● Generate Content-Length response headers and reuse TCP/IP connection
  27. 27. Tools 1● Firefox addons: – Firebug – LiveHTTPHeaders – Modify Headers● Chrome, Opera, Internet Explorer dev tools (F12)
  28. 28. Tools 2● Mark Nottingham: Caching tutorial● Redbot: Check cacheability● Old, but gold: Cacheability