This presentation cover Adobe AEM Dispatcher security and CDN and browser caching.
This presentation is the second part of a webinar on AEM Dispatcher:
http://dev.day.com/content/ddc/en/gems/dispatcher-caching---new-features-and-optimizations.html
Visit url above to view the whole presentation. Domique Pfister the primary engineer developing AEM Dispatcher covers the first part on new features.
Thanks Dominique, Hi my name is Andrew Khoury and today I’ll be covering
some basic tips on how to secure your dispatcher
and how to leverage a CDN and client-side browser caches to improve your site beyond what dispatcher provides.
Before watching this presentation, you should already have a basic understanding of
the HTTP protocol,
Apache HTTP Server configurations
and an understanding of what the AEM dispatcher is and how to use it.
Before securing dispatcher, here are some things you can do to make apache http server a little more secure:
First of all, keep your Apache server binaries up to date as security patches are released all the time.
be aware of the latest apache security reports
limit the files and directories that the apache user has access to.
if you are not using htaccess files then disable them,
If you are using SSI then use IncludesNOEXEC instead of Includes to make it so SSI calls cannot execute commands on the Operating system shell.
Disable user directories as this sometimes can expose information that we don’t intend to share.
Block directory listing in apache to prevent users from exploring the server
Disable any apache modules
Use mod_security or some other intrusion detection and prevention system.
This diagram shows a basic AEM architecture.
You can use this diagram as a reference for how you would configure your firewalls.
The idea here is to only allow traffic to flow in the direction it needs to and over the ports it needs to.
When configuring your firewalls, keep in mind that
if you are not disabling the link checker then you will need to allow all
outbound tcp/ip connections from author and publish instances.
If you plan on disabling the link checker, but need to integrate with Adobe’s
Cloud services then you can refer to the Adobe knowledge base for a list of ip
addresses to allow outbound connections to.
If you are familiar with basic AEM architecture
then you know that your web server and dispatcher
are your last line of defense before a request can
reach the publish instances.
Due to this, it is important to lock down your dispatcher and block as much unwanted traffic as possible before it reaches the publish instances.
As a first step in locking down your dispatcher’s security you should always keep the dispatcher binary up to date with the latest security fixes.
The next important thing is to
Create a strong set of filter rules in your dispatcher.any file. Filter rules will help
You keep bad traffic from reaching your publish instances.
When implementing the filter rules, it is best to use a whitelist.
This means that you deny all first, then
Only allow the requests you need for your site to function properly.
When creating allow rules, be specific as to request methods and URL patterns you want to allow.
For deny rules, be as general as possible to block all variations of bad requests.
Also, If you use the vanity URL feature in AEM then in order to implement a white list you will need to leverage the new dispatcher feature that Dominique covered earlier.
(Show dispatcher.any file) Now I’ll quickly show you my dispatcher.any filter rules.
After configuring filter rules, the next thing to consider is authentication.
If your site doesn’t allow users to log in then block users from authenticating against experience manager.
To do this, Block HTTP basic auth by listing all request headers in the /clientheaders and omit the authorization header.
Then block AEM token auth by filtering out all requests for j_security_check
For additional security, you could also block any request methods that are not supported by the site at the apache level in your httpd.conf using the LimitExcept directive.
Even with the best Filter rules you cannot filter out all invalid request patterns.
So to protect the dispatcher further there are a few things that can help.
Make sure that your error responses such as 404 not found return the correct error codes and don’t return status 200.
Cache your custom error pages in the dispatcher cache by configuring the DispatcherPassError feature.
Return 403 or 404 for bad querystring or selectors in URLs
This can be done by using the open source cq-urlfilter tool or by implementing your own javax.servlet.Filter that blocks the unwanted traffic.
One other setting that can help protect your publish instances is to set the serveStaleOnError flag. This flag tells the dispatcher to serve whatever cached files it has in case all publish instances are inaccessible.
Additionally, to protect against false dispatcher flushes we should always set /allowedClients with IP addresses of the publish instances to restrict which servers can perform dispatcher flushes.
If your site has any expensive requests such as RSS feeds or large site maps then it might make sense to exclude those requests from the dispatcher cache rules and use a periodic script to cache those files instead.
To do this, you would block the url in the /filter section of dispatcher.any and use a script like the one on this slide to handle re-requesting and re-caching the file.
If you don’t use querystrings in your site then set/ignoreUrlParams to allow requests with querystrings to get cached. This feature basically lets you specify rules for which querystrings you want to remove from the url before forwarding the request to the publish instance.
Finally, one thing you can do to prevent running out of apache threads is to set the connection /timeout in the /renders section of dispatcher.any in case requests are hanging, waiting on the publish instances.
The next step in keeping your site running smoothly is to leverage other upstream caches such as browser caches and CDNs.
First I’ll start with CDNs.
If you are not familiar with CDNs, they are large distributed networks of cache servers that optimize content delivery using geographical proximity.
To manage your cached content within a CDN, most of the time people rely on Cache-Control headers or manually configured TTLs to control the freshness of the cache.
Some CDN providers such as Akamai support on demand flush requests.
A CDN in our case is yet another way of reducing the amount of traffic that reaches the back-end.
When integrating a CDN with AEM and dispatcher there are multiple options.
We can…
(read numbers)
Here are the pros and cons
One issue that can come up when integrating a CDN into your AEM architecture is that non-cacheable requests respond with the headers set by AEM, not apache. This presents an issue when the response is served without a Cache-Control header set as some CDNs cache these responses.
The solution is to set cache-control headers at the AEM level so that if a file is non-cacheable it will still have the correct headers.
When integrating a CDN with your AEM instances it is nice to be able to
Cache js, css and other static files for a very long time by using a unique URL per version.
And to be able to implement domain sharding
It’s basically where you use multiple subdomains pointing to your CDN to serve resources such as images, js and css. By using multiple domains the browser is able to download files in parallel and the page will load faster.
To do this, the Adobe consulting team has implemented two tools
The first is called versioned clientlibs which adds a unique identifier to clientlib urls
The second is called Static reference rewriter which rewrites certain urls to point to a different domain. It also supports domain sharding.