Sample code: https://github.com/cqsupport/webinar-dispatchercache
Webinar Recording: http://my.adobeconnect.com/p7th2gf8k43/
Optimizing dispatcher cache covering:
Best practices for using the dispatcher
Tips and tricks for improving performance
Common pitfalls to avoid
How to design your site so you get the most out of your Dispatcher
FirstI will begin by quickly reviewing the basic concepts of what the dispatcher is and how it worksThe purpose of this is to gain a solid understanding of how the dispatcher decides what to cache and how files get deleted or flushed from the cache.Then I’ll cover some tips and tricks.And finally I will cover some ways you can design your application to maximize how much content is cached by the dispatcher.
The Dispatcher is a web caching and load balancing tool that improves web content delivery for the Adobe CQ platform. For caching, the Dispatcher works as part of an HTTP server, such as Apache, with the aim of storing (or "caching") as much of the static content as possible. This makes it so we accessthe website's layout engine as infrequently as possible. In a load balancing role, the Dispatcher distributes user requests (load) across multiple CQ instances (renders).
Here is a diagram demonstrating how the dispatcher typically fits in as part of your CQ architecture.Visitors request a file from your your site via the dispatcherIf the dispatcher has not already cached the file this causes the dispatcher to connect to configured CQ instance to retrieve the fileCQ responds with the fileThe dispatcher caches the fileand serves it back to the visitor.Sub sequent requests for the file would then be served from the cache.
Now that we reviewed what the dispatcher is, you might be asking why you need it.Here are some reasons why it’s recommended that you leverage the dispatcher for your site.It will help reduce load on your publishing tier,improve overall site performanceallow you to implement load balancingand allows you to filter out unwanted traffic from hitting your publish instances.Some customers have asked us in support whether or not they can replace the dispatcher with a CDN. As CDNs provide a web caching mechanism similar to the dispatcher, this is possible. However, it is not recommended as you will lose some important features that the dispatcher provides. Here are a few reasons whyOut of the box, the Dispatcher gives you morefine grained control over how you delete or“flush” files from your cache. With a CDN, if you wanted the same degree of control, you would have to implement custom code to integrate the CDN’s web service API. In addition, this usually doesn’t come as a free feature from the CDN provider.Also, since the dispatcher is installed to a web server it adds the ability to rewrite URLs, block unwanted requests, and use SSI or “Server Side Includes” before the requests hit the publish instance.
Before I can go into specifics of how dispatcher works we need to refresh ourselves on constituent the parts of a URL.If you recall, in the CQ and Sling world, the URL is broken up into 8 parts.The protocol, hostname and PortThe resource path which is the path that comes before the first period in the url.Selectors which are period delimited parametersA file extensionSuffix pathAnd the querystringPlease take a look at the diagram on the slide to refresh your memory on these.
The main thing to consider in regards to caching when designing your site is what files will the dispatcher actually cache?The dispatcher will cache files that meet the following criteria:
Rule number one, the URL must be allowed by the cache rules and filter sections of dispatcher.any
Rule number two, the URLmust have a file extension.For example, the first file listed would be cachedThe second file would not
Rule number three, the URL must not contain any querystring parameters,meaning there is no question mark in the URL.So this first file would not be cachedand the second file would be cached.
Rule number four, if the URL has a suffix path then the suffix path must have a file extension.The first file would be cachedThe second file would not.
HTTP method must be GET or HEADThe firstAnd second requests would be cachedBut the third one would not.
Rule number six, the HTTP response status must be 200 OK (meaning the request was successful)The first http response would be cachedAnd the other two would not
Rule number seven, the HTTP response must not contain the response header "Dispatcher: no-cache”.Dispatcher no cache is a special header which directs the dispatcher not to cache a particular request.In the case of the first HTTP response, the file would be cachedThe second would not
In the case of URLs containing suffixes there are a few exceptions.The first exception is that if the contained resource path is already cached then the suffixed URL would not be cached.
The second exception to the rule on suffixes isIn the opposite case when a url containing a suffix is already cachedAnd the contained resource path is requested then the suffix file would be deletedso that the resource path can be cached.
Now for a quick demonstrationShow /etc/httpd/conf/httpd.confShow /etc/httpd/conf/dispatcher.any configurationShow how dispatcher caching works.Request a file that is not in the cache and show that it appears.Show an example of invalidation
You may be asking what you should be caching
The answer is that you should cache everything that is cacheable.Here is the list of things you cannot cacheRequests where the content will change every time it is requested.Requests where the content is personalized to the user that is logged in.Requests that use selectors or suffixes where the URL has an infinite number of values and it is unlikely that users will request the same value more than once.
* Now that you have an understanding of how the dispatcher caches content, now I will explain how the dispatcher flushes and invalidates its cache.
Cache flushing is a mechanism that allows you to delete specific files from your dispatcher cache so that next time they are requested they will be re-cached.If you look at the slide, you can see a sample dispatcher flush HTTP request. The HTTP header CQ-Path provides a path to the dispatcher telling it what to remove from the cache.If you have a dispatcher flush agent configured then these flush requests are automatically done by CQ during page activation.
Here’s a diagram that demonstrates how a typical dispatcher flush works when the flushing agent is configured in an author instance.First the author user activates a page /content/fooSecond the author instance simultaneously replicates the page to the CQ publish instances and flushes the dispatcher cache.When the dispatcher cache is flushedfirst it touches the .stat file to update its timestampThen deletes files under /content/foo.*And finally deletes the _jcr_content directory under the foo directory. This folder contains any files included by components on a the foo page.After this when a user re-requests page
This slide covers the process in further detailthe user activates the pageCQ sends the flush request to the dispatcherThe dispatcher receives this request and touches the stat file, deletes all files matching the flush path and deletes the _jcr_content directory under the flush path directory.
Cache invalidation is the process by which files in the cache are considered to be out-dated so that after every flush request they are re-cached.Basically the process goes like this,The dispatcher first checks the dispatcher.any /cache invalidate rules to see that a particular file is allowed for invalidationNext the files last modified time is compared to a .stat fileAnd finally if the file is older than the .stat file it will be re-cached
Here is a diagram demonstrating how it works:A Visitorrequests a file from your your siteThe dispatcher checks is foo.html older than the last modified time of the .stat fileIf the answer is yes then the dispatcher contacts CQ publish for an updated version of the fileCQ responds with the fileThe dispatcher caches the newer version of the fileand serves it back to the visitor.Sub sequent requests for the file would then serve this the cached version until the .stat file is touched again by a flush.
Here is an example of a dispatcher.any invalidate configuration.This configuration allows all .html files to be invalidated.As you can see the invalidate section is defined by a list of globbing patterns.
The final step of understanding cache invalidation is to understand how stat files work.If you set the statfileslevel to zero then you will only have a single .stat file in the root of your dispatcher cache.Every flush request will cause this file to be touchedAnd all files allowed by the invalidate rules will be affected by this.
Now if you set the statfileslevel to 1 you will have a stat file under each subdirectoryIf you have statfileslevel set to 2 then you will have a stat files under all folders at the next subdirectory levelAnd in these cases only the closest stat file will be touched when a dispatcher flush occurs.
Now if you set the statfileslevel to 1 you will have a stat file under each subdirectoryIf you have statfileslevel set to 2 then you will have a stat files under all folders at the next subdirectory levelAnd in these cases only the closest stat file and its parent .stat files will be touched when a dispatcher flush occurs.
* Now that you have an understanding of how the dispatcher caches content, now I’ll teach you a few tips and tricks to help you optimize your dispatcher.
Tip #1: Do cache flushing from publish
When dispatcher is configured on an author instance there is a possible race condition where the older version of a file can get re-cached.We can remedy this by configuring the dispatcher flushing agent on the publish instance instead. This way it is triggered only after a page or digital asset is modified on the publish instance.
Toillustratehow the race condition happensFirst the author user activates a page /content/fooSecond the author instance simultaneously replicates the page to the CQ publish instances and flushes the dispatcher cache.When the dispatcher cache is flushedfirst it touches the .stat file to update its timestampThen deletes files under /content/foo.*And finally deletes the _jcr_content directory under the foo directory. This folder contains any files included by components on a the foo page.After this when a user re-requests page
Tip #1: Do cache flushing from publish
In dispatcher 4.0.9 afeature was added to the dispatcher that allows you optimize the way cache files are deleted by flush requests. This feature makes it so that with your flush request you can specify a list of file paths that will be re-fetched from publish instead of being deleted. By default this version will re-fetch {path}.html for flushes that have no extension and the full path of the file for dam assets and other files that have extensions (such as css, js, etc).
Here is a diagram demonstrating what happens when a file is deleted from the cache by a normal flush requestA file is modified on the publish instance and this causes a dispatcher flush which deletes the file foo.html from the cache.Mulitple Visitorsrequestfoo.html from your siteThe first of these requests that is executed by dispatcher starts a request to cache foo.htmlAll of the other requests that came in simultaneously also get proxied to publish because the cache file doesn’t exist yet.Once the file is finally in the cache then dispatcher will serve the cached version.
Here is a diagram demonstrating what happens when a file is deleted from the cache by a normal flush requestA Visitorrequests a file from your your siteThe dispatcher checks is foo.html older than the last modified time of the .stat fileIf the answer is yes then the dispatcher contacts CQ publish for an updated version of the fileCQ responds with the fileThe dispatcher caches the newer version of the fileand serves it back to the visitor.Sub sequent requests for the file would then serve this the cached version until the .stat file is touched again by a flush.
Tip #1: Do cache flushing from publishWhen dispatcher is configured on an author instance there is a possible race condition where the older version of a file can get re-cached.
Tip #2: Cache your custom error pages
By default custom error page content is served directly from the CQ publish instances because it gets served with the actual error response.To make such content cacheable, instead you can just serve an empty response for errors and have Apache web server make the decision of what content to serve with the error.To do this you will need to configure DispatcherPassError to tell Apache to handle the error responses and configure Apache’s ErrorDocument directive to tell it which URL to serve for each HTTP response error code.
Here are the steps for how it is doneSet DispatcherPassError to 0Configure each of the ErrorDocument configurations for your site.Remove any custom errorhandlerjsps you already have under apps then overlay the /libs/sling/servlet/errorhandlerjsps to not do authentication checks.
Tip #5: Block unwanted requests in the /filter section
One way you can protect your publish instances from taking on excessive load is to filter out unwanted requests.You can do this by using the filter section in dispatcher.anyWhen configuring the filter section, if possible use a whitelist style dispatcher /filter configuration where you first deny all requests then only allow valid requests to be served by the dispatcher.
Tip #5: Block unwanted requests in publish
To protect your publish instances from taking on excess load and prevent caching unwanted requests, it is recommended that you block invalid selectors, suffixes and querystrings.To do this, you will need to implement a javax.servlet.Filter that returns 403 or 404 in these cases.
Now I will quickly demonstrate how this is done.Install packageReview code in CRXDEShow configuration of filter
Tip #7: Configure the dispatcher to ignore invalid querystring.
As I mentioned earlier, when a URL has a querystring then the page will not be cached.To prevent malicious users or referring sites from adding querystrings that could put unnecessary load on your site you can configure rules to ignore all querystrings except the valid ones.
Now I will teach you how plan your CQimplementation to maximize site cache-ability.
First of all, when planning a CQ project it is important to keep caching in mind during the design and development phases of the project.
Here are some tricks you can use for cachingIn CQ, components can requested directly by requesting their content path with an html extension in the URL.For example see the URL on the slideDemoWe can leverage this to implement a flexible caching mechanism by combining this concept with Ajax or SSI.This allows us to load non-change parts of a page from cache and load dynamic sections separately.And this can be especially useful in sites with a lot of personalization.On the next slide, I will explain how to implement SSI in apache web server.
To implement SSI in Apache, enable mod_ssiThen first configure IncludesNOEXEC option to enable SSI and add an output filter with the INCLUDES handler for all html files.This tells apache to process all html requests looking for SSI tags.In your application code you could then make it so components you want to load through ssi would load an ssi tag, then your normal component code would be served with an “ssi” selector.Here is some sample code showing the SSI tag.
Similarly with Ajax you could follow the exact same approach.Here is some sample code that demonstrates how this might work.
Finally if you wanted you could even combine the two approaches.The code on the slide detects if the request is coming from dispatcher. If it is, then it serves the request via SSI.If the request is coming from a user accessing the CQ instance directly then it would use Ajax.