When dispatcher caching is not
enough…
Jakub Wądołowski
Senior Systems Engineer @ Cognifide
 The What
 The Why
 The How
Agenda
The What
It all started in 2012…
www.flickr.com/photos/nasahqphoto/16327416694
To be perfectly honest, initially it was rather like that…
www.flickr.com/photos/garryknight/5703519506
The client
 EU pharmaceutical company
 75 offices across the globe
 Over 40 000 employees
 Medical products available worldwide (180+ countries)
www.flickr.com/photos/worak/2258271659
 Country specific brochureware websites for medical products
 iPad app for sales representatives
 Single point for content entry
 Multiple integration points (SSO, user/device authentication, etc.)
 CQ 5.5, upgrade to AEM 6.1 in progress
Requirements
Main components
Brochureware
website
iPad app AEM Authoring
 Single datacenter in London (Rackspace)
 REST-like API for iPad app
 Integrations with local and remote services
Logical architecture
Initially it was just Spain, Argentina and Sweden
6 months later the number of countries was tripled
To finally reach 21 and it is still not over
The Why
“Our team in Argentina complains that the app feels slow. They can’t download
presentations sometimes. Could you please investigate that?”
Mr B.
www.flickr.com/photos/r4vi/8640618489
 Latency, latency, latency…
 Way too high round trip times (RTT)
 Timeouts
 Broken streams
 Connection resets
 Poor Internet connections in some areas
Problems
Solutions
It has been decided that Hong Kong is the way to go for us
There’s over 10 000 km between London and Buenos Aires…
…which is nearly the same distance as between London and Hong Kong
 Client-server problems became server-server ones
 How we’re going to sync all the changes (both ways)?
 What about deployments?
 Do we have enough licenses?
 What’s the best way to implement content sharding?
 How long it will take to implement all of these things?
When initial excitement was gone…
www.flickr.com/photos/geishaboy500/2496995573
PoC conclusion
 We can’t just cache more on dispatcher
 This is a very well known problem
 Let’s use the right tool to solve the problem the right way
 Content Delivery Network (CDN) is the way to go!
The road to CDN
“(…) CDN is a large distributed system of servers deployed in multiple data centers across the
Internet. The goal of a CDN is to serve content to end-users with high availability and high
performance. CDNs serve a large fraction of the Internet content today (…).”, Wikipedia
CDN definition
AEM + CDN
www.flickr.com/photos/pictures-of-money/16678590844
CDN, huh?
That's not necessarily true nowadays…
www.flickr.com/photos/halfrain/14410890555
 Pay-as-you-go model
 Powered by Varnish
 Highly customizable (ability to upload your own VCL)
 150 ms to purge – globally
 ~5 sec to change a config through the web API
 SSD powered servers connected to T1 networks
 Real-time insight what’s happening (graphs, logs, etc)
 Great support
Why Fastly?
https://www.fastly.com/network
Still not convinced?
The How
Ok… how should I start?
www.flickr.com/photos/kleuske/8004416109
www.flickr.com/photos/martinbamford/5638834940
The logs!
 grep, awk, sed - all of these are your friends
 Count your requests
 Leverage the power of log monitoring tools (ELK, Splunk, etc.)
 Plan your content structure carefully
Logs and content structure
Look for patterns
www.flickr.com/photos/wwarby/4915777722
 If it is a GET request and starts with /bin/myapp/v[1-2]/a_string.json then it is X
 All requests to /content/something/*/_jcr_content.zip end with 302 to /some/path/to/file.zip
Request patterns
Assign these patterns to multiple buckets
www.flickr.com/photos/ddebold/15991919514
 Public content
 Private content
 Content available for authorized users only
Content groups/buckets
 Reverse HTTP proxy
 In-memory time based cache
 Blazing-fast
 Big “state” machine
 Varnish Configuration Language (VCL)
 Full control of HTTP flow
Varnish in 1 slide!
 Cacheable methods: GET, HEAD
 Cacheable response codes:
 200, 203
 300, 301, 302
 404, 410
 “Cache-Control: private” if not defined otherwise
General caching rules
Let’s start with the iPad app
www.flickr.com/photos/pestoverde/15048774061
 3 request types
 REST API request
 Presentation request (ZIP files)
 Image request
iPad – HTTP flows
 2 content groups
 Private
 For all authorized users
 8 request patterns
 TTL varies from 10 minutes to 7 days
 35/65 dynamic/static content (frequently changing JSON files vs PDFs/PNGs)
 All REST API responses are private
iPad app content
 Private content is cacheable
 What makes HTTP response private?
 It is tied up with user session – in other words HTTP request carried unique authorization
cookie
Private content
www.flickr.com/photos/hyku/368912557
Is it really safe to cache that type of content?
 Varnish cache is a key-value store
 Default key: req.url + req.http.host
 req.url + req.http.host + sessionId = private cache space - voila!
Private cache
Dynamic means uncacheable?
www.flickr.com/photos/gsfc/7402445224
 Cache usually brings some trade-off
 Updates won’t be instantaneous
 TTL has to expire, or
 a purge request has to be triggered
 CDN is the way to go if you accept this delay
Dynamic content
Content purging
www.flickr.com/photos/librariesrock/13522859053
 Fastly exposes purge REST API
 Purge URL
 Purge Key
 Purge all assets marked with special “label”
 https://www.fastly.com/blog/surrogate-keys-part-1
 Purge All
 Purge vs Soft Purge
 https://www.fastly.com/blog/introducing-soft-purge
Content purging
Results
www.flickr.com/photos/89228431@N06/11322953266
 Hit ratio: 49,9%
 Cache coverage: 66,1%
 Requests: 89K
iPad app statistics
What about the speed?
www.flickr.com/photos/129341635@N02/16609174727
 Presentation downloads
 Europe: up to 21% faster
 South America: up to 50% faster
 APAC: up to 83% faster
 API responses
 Europe: up to 60% faster
 South America: up to 40% faster
 APAC: up to 55% faster
Speed boost
Issues?
www.flickr.com/photos/giuseppemilo/15414290956
Crimes against cacheability
www.flickr.com/photos/alancleaver/4121423119
 Adding Set-Cookie to every response
 Auth cookie is not revoked in the browser after logout
 TBD
Crimes against cacheability
“iPad app performance is much better now! But we still have some issues with
authoring. It is really slow in some countries.”
Mr B.
www.flickr.com/photos/r4vi/8640618489
 I was rather skeptical
 Way too dynamic to be considered cacheable?
 What kind of improvement we might get? 5-10%? Is it worth it?
 Don’t know how, but it has been decided to roll things out 
CDN in front of authoring?
 3 content groups
 36 request patterns
 TTL up to 14 days
 Mostly dynamic + static web GUI resources
 A lot of assets common for every logged in user
CDN + AEM Author
Request pattern Cachable?
/apps/cq/core/content/login/.*(png|jpg|css|js)$ YES
/libs/cq/i18n/dict.en.json YES
/etc/.*.(png|woff|css|js|jpg|gif|ttf|svg|eot|swf|ico)$ YES
/cf#/content/myapp/en/about.html NO
Authorized only!
www.flickr.com/photos/rudyjuanito/5170435542
 CDN knows nothing about user session
 The goal is to cache common content for successfully authorized users
 Authorize them at the edge!
Authorize at the edge
Auth tokens
www.flickr.com/photos/cfortier/426610972
 2nd auth cookie (token), readable by CDN
 HMAC function
 2 auth cookies are tied together
 Reference implementation: https://github.com/fastly/token-functions
 Private key shared between AEM and CDN
 CDN can evaluate user session without request to AEM
Auth tokens
96,3%
www.flickr.com/photos/spacexphotos/16169087563
 Hit ratio: 96,3%
 Cache coverage: 45,7%
 Requests: 83K
Author statistics
 Adding Set-Cookie to every response
 Auth cookie is not revoked in the browser after logout
 “Vary: Cookie” usage
Crimes against cacheability
www.flickr.com/photos/aushiker/20369395093
What about deployments?
 Does every deploy involve full CDN cache purge?
 Nope!
 iPad presentations are packaged in a ZIP file and versioned
 Majority of authoring related cacheable assets stay untouched between deployments
AEM deployments
Summary
www.flickr.com/photos/andrewhurley/6254409229
 Traffic growth is no longer an issue
 Over 2 TB monthly reaches CDN servers
 ~5,5 million HTTP requests per month
 just ~570 GB was passed through to AEM
 License, budget and time savings
 More than satisfying results
 Very small changes in the AEM app itself
 Happy client 
Summary
jakub.wadolowski@cognifide.com
github.com/jwadolowski
twitter.com/jwadolowski
linkedin.com/in/kubawadolowski/en

When dispatcher caching is not enough... (extended version)