Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1 
Cache Concepts and Varnish-Cache 
Playing with Varnish 
Marc Cortinas – Production Service - Webops - Semptember2014
2 
Agenda 
Part 1: Cache Concepts (10min) 
1. What is Caching? 
2. Cache levels and types 
3. The Rules 
4. Header Pragma ...
CDNs/Varnish/Nginx/ 
Apache(mod_cache) 
3 
Cache Levels and types 
Browser 
ISP 
Proxies 
CPD LTM/Varnish/Nginx/ 
Apache(m...
4 
What is Caching? 
Caching is a great example of the ubiquitous time-space tradeoff in 
programming. You can save time b...
5 
The Rules 
Describe the main http headers help us distribute and cache the content 
efficiently. 
RFCs: 
1. http1.0 
1....
6 
Header Pragma 
Pragma Header is “deprecated” 
RFCs: 
1. http1.0 
1. Request header used to revalidate any cached respon...
Header Last-Modified 
One fix is for the server to tell the browser what version of the file it is sending. A 
server can ...
8 
Header Etag 
What if the server’s clock was originally wrong and then got fixed? What if 
daylight savings time comes e...
9 
Header Expires 
Caching a file and checking with the server is nice, except for one thing: we are 
still checking with ...
10 
Header Cache-Control 
Expires is great, but it has to be computed for every date. The max-age header 
lets us say “Thi...
11 
The Cache Key 
URI Split 
<protocol>://<user>@<passwd>:<host>:<port>/<path>?<qsa> 
What’s the cache key? (HASH in varn...
12 
Methods and Cacheability 
HTTP METHOD Cacheability 
GET Yes 
HEAD Yes 
POST No 
PUT No 
DELETE No 
OPTIONS No 
TRACE N...
13 
Different ways to define cache TTL 
• Application: PHP, Java, etc... (the best custom-scenary) 
• HTTP Server: 
By Fil...
14 
Part 2: What’s varnish-cache? 
Project Web: https://www.varnish-cache.org/ 
Documentation: https://www.varnish-cache. ...
15 
Process Architecture 
1. Management process apply 
configuration changes (VCL 
and parameters), compile 
VCL, monitor ...
16 
Installation and Basic Configuration 
Installation: 
• Debian/Ubuntu: apt from repository repo.varnish-cache.org 
• Fr...
Varnish Configuration Language – VCL Backends, Probes, Directors 
17 
backend default { 
.host = "cdn1.edreams.com"; 
.por...
Varnish Configuration Language – VCL functions or subroutines 
18 
• vcl_recv is the first VCL function executed, 
right a...
Varnish Configuration Language – VCL Reference 
https://www.varnish-cache.org/docs/4.0/reference/vcl.html 
• Built-in Func...
Varnish Configuration Language – VCL Variables Availability 
20
21 
VCL Graph
Varnish Configuration Language – VCL Stale content 
22 
Stale with revalidate 
Varnish stale content while a fresh 
conten...
23 
Tools 
https://www.varnish-software. 
com/static/book/Appendix_A__Varnish_Programs.html 
• varnistop - groups tags and...
Our VCL configuration and tools in cdn-own.edreams.com 
Connect by SSH2 !! 
24
25 
VMOD Directory 
Community Directory with varnish modules. 
https://www.varnish-cache.org/vmods 
• Useful module: Query...
26 
Tunning and best practices 
• Data Size – Doc Link 
Be aware that every object that is stored also carries overhead th...
27 
Prof-concept without HTTPs 
Without HTTPs = same confitions like other CDNs 
Stress Benchmark with 1 instance: we can ...
28 
Links and tags 
Links: 
https://www.mnot.net/cache_docs/ 
https://www.varnish-cache.org/docs/4.0/users-guide/index.htm...
29 
Thanks... Questions?
Upcoming SlideShare
Loading in …5
×

cache concepts and varnish-cache

1,547 views

Published on

describe cache concepts and introduction to Varnish

Published in: Engineering
  • Be the first to comment

cache concepts and varnish-cache

  1. 1. 1 Cache Concepts and Varnish-Cache Playing with Varnish Marc Cortinas – Production Service - Webops - Semptember2014
  2. 2. 2 Agenda Part 1: Cache Concepts (10min) 1. What is Caching? 2. Cache levels and types 3. The Rules 4. Header Pragma 5. Header Last-Modified 6. Header Etag 7. Header Expires 8. Header Cache Control 9. Cache key 10. Methods and cacheability Part 2: Varnish-Cache (30min) 1. What is Varnish-cache? 2. Process Architecture 3. Installation and Basic Configuration 4. VCL Backends, Probes, Directors 5. VCL functions or subroutines 6. VCL Reference 7. VCL Variables Availability 8. VCL Subroutines Graph 9. VCL Stale Content 10. Our VCL configuration in cdn-own.edreams.com 11. VMOD Directory 12. Tunning and best practices 13. Proof-Concept with siege (only HTTP)
  3. 3. CDNs/Varnish/Nginx/ Apache(mod_cache) 3 Cache Levels and types Browser ISP Proxies CPD LTM/Varnish/Nginx/ Apache(mod_cache) Code Cache APC php Data Cache redis,etc.. Disk Cache - Local Cache - Reverse Proxy Caches - Data Cache - Code cache - Disk Cache
  4. 4. 4 What is Caching? Caching is a great example of the ubiquitous time-space tradeoff in programming. You can save time by using space to store results. In the case of websites, the browser can save a copy of images, stylesheets, javascript or the entire page. The next time the user needs that resource (such as a script or logo that appears on every page), the browser doesn’t have to download it again. Fewer downloads means a faster, happier site. Here’s a quick refresher on how a web browser gets a page from the server:
  5. 5. 5 The Rules Describe the main http headers help us distribute and cache the content efficiently. RFCs: 1. http1.0 1. http://www.rfc-base.org/rfc-1945.html 2. http2.0 1. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html HTTP HEADERs is our friend!
  6. 6. 6 Header Pragma Pragma Header is “deprecated” RFCs: 1. http1.0 1. Request header used to revalidate any cached response before using it. 2. http2.0 1. only works for HTTP/1.1 caches when Cache-Control is missing.
  7. 7. Header Last-Modified One fix is for the server to tell the browser what version of the file it is sending. A server can return a Last-modified date along with the file (let’s call it logo.png), like this: Last-modified: Fri, 16 Mar 2007 04:00:25 GMT 7 304 is cheaper than all obj!
  8. 8. 8 Header Etag What if the server’s clock was originally wrong and then got fixed? What if daylight savings time comes early and the server isn’t updated? The caches could be inaccurate. ETags to the rescue. An ETag is a unique identifier given to every file. It’s like a hash or fingerprint: every file gets a unique fingerprint, and if you change the file (even by one byte), the fingerprint changes as well.
  9. 9. 9 Header Expires Caching a file and checking with the server is nice, except for one thing: we are still checking with the server Example: Expires: Fri, 30 Oct 1998 14:19:41 GMT (Past=uncacheable) - Absolute Time (totally dependency of Clocks ) - last time that the client retrieved the document (last access time) - last time the document changed on your server (last modification time)
  10. 10. 10 Header Cache-Control Expires is great, but it has to be computed for every date. The max-age header lets us say “This file expires 1 week from today”, which is simpler than setting an explicit date. max-age=[seconds] — specifies the maximum amount of time that a representation will be considered fresh. Similar to Expires, this directive is relative to the time of the request, rather than absolute. [seconds] is the number of seconds from the time of the request you wish the representation to be fresh for. s-maxage=[seconds] — similar to max-age, except that it only applies to shared (e.g., proxy) caches. public — marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically private. private — allows caches that are specific to one user (e.g., in a browser) to store the response; shared caches (e.g., in a proxy) may not. no-cache — forces caches to submit the request to the origin server for validation before releasing a cached copy, every time. This is useful to assure that authentication is respected (in combination with public), or to maintain rigid freshness, without sacrificing all of the benefits of caching. no-store — instructs caches not to keep a copy of the representation under any conditions. must-revalidate — tells caches that they must obey any freshness information you give them about a representation. HTTP allows caches to serve stale representations under special conditions; by specifying this header, you’re telling the cache that you want it to strictly follow your rules. proxy-revalidate — similar to must-revalidate, except that it only applies to proxy caches. Cache-Control: public, max-age=0, s-maxage=0, no-cache, no-store, must-revalidate, proxy-revalidate Cache-Control: public, max-age=3600
  11. 11. 11 The Cache Key URI Split <protocol>://<user>@<passwd>:<host>:<port>/<path>?<qsa> What’s the cache key? (HASH in varnish) The key to find the object again!! - Akamai Cache Key ❯ akacurl "http://cdn-aka.edreams.com/?a=1&b=2&c=3" 2>&1 |grep -i "X-Cache-key" ⏎ X-Cache-Key: /L/728/323898/1h/cdn-aka.edreams.com/?a=1&b=2&c=3 - Varnish HASH sub vcl_hash { hash_data(req.url); if (req.http.host) { hash_data(req.http.host); } else { hash_data(server.ip); } return (lookup) }
  12. 12. 12 Methods and Cacheability HTTP METHOD Cacheability GET Yes HEAD Yes POST No PUT No DELETE No OPTIONS No TRACE No CONNECT No PATCH No
  13. 13. 13 Different ways to define cache TTL • Application: PHP, Java, etc... (the best custom-scenary) • HTTP Server: By FilesMatch Ex: <FilesMatch ".(html|htm|php|cgi|pl)$”> By Location: Ex: <Location "/deal”> By LocationMatch (EREG) Ex: <LocationMatch "/offers/.*/today/.*">
  14. 14. 14 Part 2: What’s varnish-cache? Project Web: https://www.varnish-cache.org/ Documentation: https://www.varnish-cache. org/docs/4.0/reference/index.html GitHub: https://github.com/varnish/Varnish-Cache Varnish is an HTTP accelerator designed for content-heavy dynamic web sites. In contrast to other web accelerators, such as Squid, which began life as a client-side cache, or Apache and nginx, which are primarily origin servers, Varnish was designed as an HTTP accelerator. Varnish is focused exclusively on HTTP, unlike other proxy servers that often support FTP, SMTP and other network protocols. Version 1.0 of Varnish was released in 2006, Varnish 2.0 in 2008, Varnish 3.0 in 2011, and Varnish 4.0 in 2014
  15. 15. 15 Process Architecture 1. Management process apply configuration changes (VCL and parameters), compile VCL, monitor Varnish, initialize Varnish and provides a command line interface, accessible either directly on the terminal or through a management interface. 2. Child process consist of several different types of threads, including, but not limited to: • Acceptor thread to accept new connections and delegate them. • Worker threads - one per session. It’s common to use hundreds of worker threads. • Expiry thread, to evict old content from the cache.
  16. 16. 16 Installation and Basic Configuration Installation: • Debian/Ubuntu: apt from repository repo.varnish-cache.org • FreeBSD: Compile with freebsd ports • RedHat/CentOS/Fedora: yum from EPEL repository • Solaris 10 and 11: Compile with gmake • MacOsX: compile with automake from macports Configuration: • /etc/default/varnish or /etc/sysconfig/varnish: set parameters of binary file -P /var/run/varnishd.pid -a :80 -T localhost:6082 -f /etc/varnish/default.vcl -S /etc/ varnish/secret -s malloc,3g -p thread_pools=4 -p thread_pool_min=100 -p thread_pool_max=1000 -p thread_pool_add_delay=2 • /etc/varnish/default.vcl – Varnish Configuration Language – Initial Configuration.
  17. 17. Varnish Configuration Language – VCL Backends, Probes, Directors 17 backend default { .host = "cdn1.edreams.com"; .port = "80"; .probe = { .url = "/engine/static-content/unversioned/html/blank.html"; .timeout = 1s; .interval = 10s; .window = 5; .threshold = 2; } … } backend web1 { .host = "cdn1.edreams.com"; .port = "80"; } # Below is an example redirector based on round-robin requests import directors; sub vcl_init { new cluster1 = directors.round_robin(); cluster1.add_backend(web1); # Backend web1 defined above }
  18. 18. Varnish Configuration Language – VCL functions or subroutines 18 • vcl_recv is the first VCL function executed, right after Varnish has decoded the request into its basic data structure. • Modifying the client data to reduce cache diversity. E.g., removing any leading “www.” in a URL. • Deciding caching policy based on client data. E.g., Not caching POST requests, only caching specific URLs, etc • Executing re-write rules needed for specific web applications. • Deciding which Web server to use. • vcl_fetch is designed to avoid caching anything with a set-cookie header. There are very few situations where caching content with a set-cookie header is desirable. • vcl_hash • Defines what is unique about a request. • Executed directly after vcl_recv • vcl_hit • Right after an object has been found (hit) in the cache • You can change the TTL or issue purge; • Often used to throw out an old object • vcl_miss • Right after an object was looked up and not found in cache • Mostly used to issue purge; • Can also be used to modify backend request headers • vcl_pass • Run after a pass in vcl_recv OR after a lookup that returned a hitpass • Not run after vcl_fetch. • vcl_deliver • Common last exit point for all (except vcl_pipe) code paths • Often used to add and remove debug-headers • vcl_error • Used to generate content from within Varnish, without talking to a web server • Error messages go here by default • Other use cases: Redirecting users (301/302 Redirects)
  19. 19. Varnish Configuration Language – VCL Reference https://www.varnish-cache.org/docs/4.0/reference/vcl.html • Built-in Functions: ban(expr),call(subroutine),hash_data(input),new(),return(),rollback(),synth etic(STRING),regsub(str,regex,sub), regsuball(str,regex,sub) • Perl-compatible Regular Expression (PCRE) • Own subroutines: sub own_subroutine { … } • ACL to group IP or subnets • Probes - healthcheck • Backend Definition • Import modules (VMODS) • Include statements – load/add vcl configuration file • Integers, Reals Numbers or strings • Operators: =,==, ~,!,&&,|| • Conditionals: If|Else|elseif 19
  20. 20. Varnish Configuration Language – VCL Variables Availability 20
  21. 21. 21 VCL Graph
  22. 22. Varnish Configuration Language – VCL Stale content 22 Stale with revalidate Varnish stale content while a fresh content is fetched. Code: sub vcl_recv { set req.grace = 300s } sub vcl_fetch { set obj.grace = 300s } Stale with backend down  Another e- commerce called this behaivour “Nightmare mode” Varnish stale content (outdated) even when backend is unreached Code: sub vcl_recv { if (req.backend.healthy) { set req.grace = 10s else set req.grace = 2h; } sub vcl_fetch { set obj.grace = 2h }
  23. 23. 23 Tools https://www.varnish-software. com/static/book/Appendix_A__Varnish_Programs.html • varnistop - groups tags and the content of the tag together to generate a sorted list of the most frequently appearing tag/tag-content pair. • varnishncsa – used to print shmlog as ncsa-styled log (similar Apache) • varnishstat – display stadistics from varnish running instance • varnishhist – very useful  • varnishreplay – utility parses Varnish logs and attempts to reproduce the traffic. • varnishtest – script driven program used to test the Varnish Cache • varnishadm – load different vcl configuration on-the-fly – ban/purge/invalidate content cached
  24. 24. Our VCL configuration and tools in cdn-own.edreams.com Connect by SSH2 !! 24
  25. 25. 25 VMOD Directory Community Directory with varnish modules. https://www.varnish-cache.org/vmods • Useful module: QueryString This module aims to become your Swiss Army knife to increase your hit ratio by tweaking the query string of your incoming requests. The plugin is still under development but it can already: – remove or clean the query string – filter specific query parameters based on a name list or a regexp – sort the query parameters
  26. 26. 26 Tunning and best practices • Data Size – Doc Link Be aware that every object that is stored also carries overhead that is kept outside the actually storage area. So, even if you specify -s malloc,16G Varnish might actually use double that. Varnish has a overhead of about 1KB per object. So, if you have lots of small objects in your cache the overhead might be significant. • Check System Parameters – Doc Link Be aware all the parameters (ex. Shortlived, sess_workspace) • Storage Backend in RAM – Doc Link (-s malloc) • Shared Memory Log (also called) mounted in RAM tmpfs – Doc Link • Custom Timers: (connect_timeout, first_byte_timeout, between_bytes_timeout, send_timeout, sess_timeout, cli_timeout) • Timing thread growth (thread_pool_add_delay, thread_pool_timeout, thread_pool_fail_delay) • Number of threads (thread_pool_min,thread_pool_max)
  27. 27. 27 Prof-concept without HTTPs Without HTTPs = same confitions like other CDNs Stress Benchmark with 1 instance: we can stress more this instance but we need more resources to launch "siege" command.
  28. 28. 28 Links and tags Links: https://www.mnot.net/cache_docs/ https://www.varnish-cache.org/docs/4.0/users-guide/index.html - users-guide-index https://www.varnish-cache.org/docs/4.0/reference/index.html https://www.varnish-software.com/static/book/ https://marc.cortinasval.cat/blog/2013/12/17/varnish-cache-un-molt- bon-aliat-per-a-la-web/ Tags: fresh, purge, invalidate, caducate, stale, cache, cache key, hash,cdn,others…
  29. 29. 29 Thanks... Questions?

×