Alexander ShopovBy day: Software Engineer at CiscoBy night: OSS contributorCoordinator of Bulgarian Gnome TP Contacts:E-mail: email@example.comJabber: firstname.lastname@example.orgLinkedIn: http://www.linkedin.com/in/alshopovGoogle: Just search “al_shopov”
DisclaimerMy opinions, knowledge and experience! Not my employers.
Why Cache At All?● Lowers number of requests, improves latency, provides scaling● AJAX caching leads to lively applications● Lowers server load for all kinds of content, but especially important (and hard) for dynamic content
MOST IMPORTANT RESOURCE!● RFC 2616 http://tools.ietf.org/html/rfc2616● HTTP caching: – http://tools.ietf.org/html/rfc2616#section-13
Purpose of caching● Eliminate the need for requests – No server round trip at all – fastest way – Expiration – received data is fine● Eliminate the need for full answers – Lower traffic, narrow bandwidth – Validation – received data probably fine, check it
HTTP participants● All of them in the protocol from day 1 – not an afterthought! – Origin server – Gateway/reverse proxy (shared cache) – Proxy (shared cache) – Client (can have internal cache – non shared cache)● Gateway is similar to Proxy – Proxies – chosen by client (or clients) – Gateways – chosen by server
Client ↔ Intermediaries ↔ Server● Easy/safe upgrade of protocol during conversation● Caching principles: – Semantically transparent – Explicit permits for non transparent actions – Intermediaries can add warnings – Caching headers/directives can be one way● Different behaviour for requests: – Safe requests: GET/HEAD. Breaking this is servers fault, not clients. All other requests must reach origin server – Idempotent requests – repeating ≡ doing them once: GET/HEAD + PUT/DELETE/OPTIONS/TRACE
HTML: Meta tags● Widely used and as widely ineffective: – The only thing HTML designers can put – Not read/used by intermediaries – Not all browser caches honour it● Do not rely on them! No real reason to use them. (actually the real reason is that habit is second nature).
HTTP 1.0● Pragma: no-cache – Pragmas are problematic – not all participants honour them.● Proper equivalent in HTTP 1.1: – Cache-Control: no-cache – Take from server even if available from cache
HTTP 1.1● Expires – until then have it fresh● ETag – (do) you have this version● Cache-Control – fine grained tuning
Expires● Expires: absolute_date● To mark a resource already expired include header: Expires = Date
ETag – 1● No ordering, just value – either matches (single value or a value from set) or does not.● Per URI – no sense in comparing tags from different URIs, E = entity● ETag: resource tag – ETag: "xyzzy" – strong, bit by bit equivalence – ETag: W/"xyzzy" – weak, semantic equivalence● Different matches – Strong – matches and all tags are strong. – Weak – matches, possible for tag to be weak.
ETag – 2● Conditional requests: if matching – just confirmation, otherwise – data itself – If-Modified-Since – If-Unmodified-Since – If-Match – If-None-Match – If-Range● Strong tags allow for caching of partial answers
Cache-Control● All HTTP 1.1 participants MUST obey it (otherwise they are broken.● MUST reach all participants● Cannot target a particular intermediary
Cache-Control Categories● What is cacheable – only imposed by server● What can be stored in cache – imposed by server and client● Modifications on expiration – imposed by server and client● Control over cache revalidation and reload – only imposed by client● Control over transformation of entities● Extensions to the caching system
Cache-Control – Requests 1● no-cache – cache should revalidate with server● no-store – do not store on durable media● max-age[=sec] – clients wants info no older than this● max-stale[=sec] – client accepts stale information but no more stale than this
Cache-Control – Requests 2● min-fresh[=sec] – clients wants info that will stay fresh for this time● no-transform – no trasnform by intermediary – Medical Xray Photo from PNG to JPEG● only-if-cached – when connection to server is bad. Better to get 504 (Gateway Timeout) than wait● cache-extension – extensions
Cache-Control – Responses 1● public – may be cached by any cache● private – must not be cached by shared cache● no-cache – cache should revalidate with server● no-store – do not store on durable media● no-transform – no trasnform by intermediary● must-revalidate – server requested revalidation of stale data● proxy-revalidate – same as above but not for user agent cache
Cache-Control – Responses 2● max-age[=sec] – for any cache● s-maxage[=sec] – for shared cache, priority over max-age and Expires.● cache-extension – extensions
Status Codes 1● 201 Created – can contain ETag, resource created – (contrast with 202)● 203 Non-Authoritative Information – not from originating server but from cache● 206 Partial Content – range partial GET request – (contains ETag, Expires, Cache-Control, Vary if changeable). Result to If-Range. If either ETag or Last-Modified dont match – cache does not combine them with others. If no support from ranges in cache – 206 not cached.
Status Codes 2● 302 Found – redirect that can change. Use Cache- Control or Expires● 304 Not Modified – conditional GET, resource not changed, body of response empty (ETag/Content- Location, Expires, Cache-Control or Vary)● 305 Use Proxy – per request, generated by server● 307 Temporary Redirect – similar to 302
Conditional requests/responses● Origin servers – Should provide both ETag (preferably strong unless not feasible) and Last-Modified – Must avoid reusing specific strong ETag for different entities● Clients – Must/should use ETag Last-Modified and them in conditional requests
AJAX● Use cache directives in AJAX● Try to make your AJAX responses cacheable (you will have to think!)● POSTs are mostly uncacheable, prefer GETs to fetch information● Generate Content-Length response headers and reuse TCP/IP connection
Tools 1● Firefox addons: – Firebug – LiveHTTPHeaders – Modify Headers● Chrome, Opera, Internet Explorer dev tools (F12)
Tools 2● Mark Nottingham: Caching tutorial● Redbot: Check cacheability● Old, but gold: Cacheability