High Performance Sites with Drupal and Cache Control Module


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • -CC: not going to go in details
  • -60 people, most of which developers-JanneKalliola is the chair of Business and Strategy track
  • -address to the project page, check out the code if you like
  • -I’ll be talking about Varnish, because it’s most familiar to us (tested also with nginx cache, doesn’t support purges)-purges: consider listings on e.g. the front page (+compare purges with the Columbia Law School Tag! session from before)-You can also select TTL per path
  • -context switch made if page is 1) set cacheable in the ui 2) accessible by anonymous user 3) other details not worth mentioning
  • -some technical details: for each personalized component, store function and arguments that are needed to generate the data, also html id
  • some bootstrapping + heavy theming is avoidedget_components still requires bootstrap, run with the current user session (NOT CACHED) in some cases, the site feeling faster is really just a feeling
  • - complex matter: using Varnish isn’t the only thing you need to do- ”thinking in cache control”: what’s going to be personalized, can something be done differently etc. Do this as early as possible!- custom code: call hook_cache_control for tagging components (unobtrusive, you can disable cc at any time because of this), load some js and css. Why site looks different? Because Drupal
  • -when we released Cache Control, one of the first questions was about ESI
  • -might have changed, haven’t checked for a while
  • -forum notDrupal-cache control disabled for admins
  • news, teams, results, statistics…
  • -no purges (except for automatic ones)-content propagation mostly handled with low TTLs (front page etc.)
  • -around since 1998 or so, this is the fourth incarnation, huge migration, 250 000 registered users, millions of nodes (threads, community pages, blog posts)-teenage girls really let you know if something’s wrong-every page has a lot of personalized components, which is a challenge
  • -forum listing, json backend-personalized content on the right
  • -almost one server: php-fpm partly offloaded to another server-custom code: js/css loading, purging, redirects so that purges work (Cache Control at its worst)
  • -json-backend: not directly related to cc, but is an example of the fact that cc alone doesn’t solve your problems-mention front themer?-sysadmin: most Drupaldevs are not in their comfort zone with Varnish, db optimization, server configuration etc.
  • -mongodb panic rewrite-varnish loads-form cache + memcached -> problems with our number of forms (space just runs out) -> move to mysql -> loads skyrocket
  • -glitches: redirects, messages, css/js-don’t really know if this can be avoided due to the way Drupal handles things
  • -support forum is probably the most user-generated content there is
  • -very simple: lessons to use the tools
  • -in addition,fastly has good coverage of nodes throughout the world-other options: Akamai – trouble with POST requests (?)-seems happy, but is not as happy as Jatkoaika: some custom code was needed
  • it does this by manipulating cache-control HTTP headers (integration with Varnish), caches anonymous pages, personalizes on AJAXeasy = not that much personalized content (or personalized content), few purges, hard = custom code, personalized content, lots of purges, small js/css glitchesgeographical distribution = together with a suitable CDN
  • High Performance Sites with Drupal and Cache Control Module

    2. 2. Outline  Exove in brief  What is Cache Control and how does it work?  Easy case: Jatkoaika.com  Anonymous users, high read/write ratio  Hard case: Demi.fi  Autheticated users, low read/write ratio  Different case: Tekla Campus  Using Cache Control with a CDN  Discussion
    3. 3. Exove in Brief ExoveisaleadingNorthernEuropeancompanyspecialisinginopen sourcewebservicesdesignanddevelopment. WehelpcompaniesconductbetterbusinessontheInternet throughbest-of-breedpersonnelandsolutions. Quickfacts:  Founded2006  About60people  Servedmorethan120clients  OfficesinFinland,Estonia,andtheUK  CEOJanneKalliola  Meetusatbooth38
    4. 4. WHAT IS CACHE CONTROL AND HOW DOES IT WORK? drupal.org/project/cache_control
    5. 5. What is Cache Control?  Module for integrating your site with Varnish or some other HTTP cache  Sets appropriate Cache-Control headers in HTTP responses from Drupal.  Supports also content purging  Automatic purges for e.g. node updates, hooks for custom purges  Comes with an admin UI for selecting cacheable menu router paths and a VCLfile for Varnish
    6. 6. How does it work?  Varnish checks if requested page is cached  If it is, Varnish sends it to user’s browser (also for authenticated users!)  If it isn’t, pass the request to Drupal, execute the page load as anonymous user and cache the response in Varnish  Process the response in user’s browser  For anonymous users, show the page as is  For authenticated users, generate personalized parts in anAJAX back-end (get_components) and inject the results on the page
    7. 7. “Personalized content?”  You can enable Cache Control for any Drupal block – the block will be generated for authenticated users in the get_components back-end  Using Cache Control’sAPI, you can “tag” any part of the page to be generated for in the get_components back-end
    8. 8. Benefits of Cache Control  Only the needed parts are loaded: The back- end is significantly less burdened  All personalized parts of the are loaded in a single request  The user is given something to look at while the hard parts of the page are being loaded – the site feels faster
    9. 9. What’s the catch?  Building high-performance sites is a complex matter. Cache Control is not a magic bullet to solve all your performance issues  While developing, you have to “think in Cache Control” or you’ll be in a world of trouble  You will most likely end up writing at least some custom code and spending time wondering why the site behaves differently when Cache Control is enabled
    10. 10. What about ESI?  ESI (Edge Side Includes) is a partial loading technique supported by Varnish and some CDNs, e.g.Akamai  It basically makes Varnish do the partial page loading  Varnish first fetches the common version from cache  Then it looks though the page to see any ESI markup  Then it loads all the ESI marked parts of the page from cache or from Drupal
    11. 11. How does Cache Control differ from ESI?  ESI needs to wait until the whole page is loaded before giving anything to the user  ESI loads all the portions of the page (still in D7, this might change in D8) in separate HTTP requests, thus burdening the server with even more bootstraps than without any cache
    12. 12. THE EASY CASE
    13. 13. Jatkoaika.com  Jatkoaika.com is the leading ice hockey site in Finland  200 000 unique visitors and 1.6M page loads per week  Page loads in Drupal are almost exclusively done by anonymous users  Content is read a lot more often than written, making the site an ideal use case for Cache Control
    14. 14. Jatkoaika.com – Setup  Drupal, MySQL, SOLR, memcached, Varnish – all running on one server  Cache Control enabled for all content pages (nodes, taxonomy terms, front page) with different TTLs – no custom code required  Server loads are minimal
    15. 15. THE HARD CASE
    16. 16. Demi.fi  Demi.fi is the community around the Demi magazine, targeted to teenage girls  2.8M weekly page views  Most page loads done by authenticated users  1 300 – 1 500 logged-in users during busy hours  The users generate a lot of content (forum posts, comments, etc.)  Keeping the cache up to date is a challenge
    17. 17. Demi.fi – Setup  Drupal, MySQL (Percona), SOLR, MongoDB, nginx + php-fpm, memcached, Varnish – all running on (almost) one server  Cache Control enabled for almost all user-facing pages and someAJAX backends as well  Alot of personalized components per page, putting strain on the get_components back-end  Quite a lot custom code required in making the site compatible and triggering cache purges when needed.  Server loads are significant but mostly tolerable
    18. 18. Demi.fi – Strategy  Avoid Drupal bootstrap and theming  Cache Control: try to keep as much content in Varnish cache as possible  Fast JSON-based backends for data that changes often (e.g. forum topic listings): offload theming to users’browsers. Use Cache Control to cache the results with shortTTL(30 secs or so)  Use fast storage: SOLR for Views, MongoDB for field storage, memcached for cache.  Get a good sysadmin
    19. 19. Demi.fi – Lessons Learned  Cache Control’s get_components back-end needs to be fast  Cache Control now supports MongoDB as storage backend  Cache Control’s front-end needs to be fast  We had to rethink how to manipulate the page that has lots of personalized content  Continuous cache purging can also be a performance issue  Varnish 3.0-style bans take up a lot of resources, use purges (2.0-style bans) instead
    20. 20. Demi.fi – More Lessons  Building high-performance sites is hard, and it gets harder if you don’t take performance into account from the very beginning  This includes design: be aware of the performance cost of displaying a certain piece of content on a page, identify and mitigate potential performance killers  Cache Control is far from perfect and doesn’t alone solve your problems  Ironing out small glitches with e.g. cache purging can be a lot of work –
    22. 22. Tekla Campus  Tekla Campus is an e-learning tool and community for engineering and construction students  Users come from all over the world  Almost all of them are authenticated  Not that much user-generated content, moderate amount of personalized content for logged-in users
    23. 23. Tekla Campus – Setup  The site is hosted in Finland, but user base is spread all over the world  To mitigate latency, we needed a CDN solution  Turns out Fastly CDN uses Varnish, so we decided to give it a go  Cache Control plays nicely with Fastly, even cache purges work out of the box  Fastly even allows you to upload your own VCL
    24. 24. SUMMARY
    25. 25. Summary  Cache Control is a module for integrating your site with e.g. Varnish. It works for both anonymous and authenticated users  It can help make your site a lot faster  It can be easy or hard, depending on the complexity of your site  You can also use it to help with geographical distribution of your site
    26. 26. THANK YOU! WHAT DID YOU THINK? Locate this session at the DrupalCon Prague website: http://prague2013.drupal.org/schedule Click the “Take the survey” link