Scaling AEM
Michael Marth | Senior Engineering Manager

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
Scalability?
§

Performance vs Scalability
§
§

§
§
§

Performance: “it takes X secs to do Y”
Scalability: “it takes X secs to do Y simultaneously Z times”

Yet, performance can help with scalability
This talk is about horizontal scalability (vertical scaling is trivial)
The content is documented at
http://dev.day.com/docs/en/cq/current/deploying/scaling.html

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

2
Preliminary Notes
§

This talk is about pre-Oak scalability patterns, i.e. the CQ5.x line
§

§

with AEM 6 these patterns will change

CRX2 Cluster is mostly a technology for reliability, not for scalability
§

Slight exception is read scalability

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

3
Scaling AEM
1. High Volume and High Performance Delivery
2. High Frequency Input Feed
3. High Processing Input Feed
4. High Volume Input Feed
5. Many Editors
6. Geo-distributed Editors
7. Many DAM assets
8. Geo-distributed disaster recovery

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

4
High Volume and High Performance Delivery - Description
§

Use Case:
§

High traffic site (100m impressions/d)

§

Examples: adobe.com

§

Limiting factor
§

CPU on publish

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

5
High Volume and High Performance Delivery - Solution Pattern
§

Leverage dispatcher caching as much as possible
§

§

SSI and/or client-side for personalized content

§

§

in latest dispatcher: single-page dispatcher flush and scripted flushing, use to cache/flush
content in dispatcher

Selectors for query caching

CDN with short TTL

P

Disp

round robin

Load
Balancer

P

Disp

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

6

CDN

Client
High Volume and High Performance Delivery
§

Notes
§

Related to rendering performance, see also
§

§

§

CQ performance patterns (use CQ timing component, prefer tree walking over JCR queries,
use ClientLibraryManager to concat and minify JS, etc, see [1])
Generic performance patterns (reduce requests with e.g. css sprites, gzip responses, put JS
calls at bottom of HTML, etc, see [2])

Anti-Pattern
§

Adding publishers before leveraging caching

[1] http://dev.day.com/docs/en/cq/current/deploying/performance.html
[2] http://shop.oreilly.com/product/9780596529307.do
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

7
High Frequency Input Feed - Description
§

Use Case: news feed import (moderate amounts, but constant updates)

§

Limiting factor
§

Dispatcher cache invalidation
§

Therefore actual limiting factor is CPU on publish

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

8
High Frequency Input Feed - Solution Pattern 1
§

Set up content structure so that other pages do not get invalidated on dispatcher
cache
§
§

§

if possible: highly volatile content e.g. in /etc
with latest dispatcher: single-page flush possible

Separate replication queue (so that main queue is not blocked)

Backend

import
process

editing

A

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

replication

P

9

flush

Disp
High Frequency Input Feed - Solution Pattern 2
§

Set up content structure so that other pages do not get invalidated on dispatcher
cache
§

§

as previous pattern

Import directly into Publish (no replication necessary)

Backend

Backend

import
process

A

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

import
process

P

flush

Disp
High Frequency Input Feed
§

Questions to ask
§

Human filtering/processing needed? Then imports should be on author and
replicated.
§

If no: is the use case OK with different states on publish?
§

if yes: no replication needed, then pattern 2 is preferable

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

11
High Processing Input Feed - Description
§

Use Case:
§

DAM import of images
§
§

happens regularly

§

§

1000 images at once

other editors are editing content at the same time

Limiting factor
§

CPU, memory on author

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

12
High Processing Input Feed - Solution Pattern 1
§

Separate processing instances from human editing instances
§

Offload 1 Workflow step, e.g. thumbnail generation from PSDs

§

There can be more than 1 processing instance

§

Replicate back and forth in packages if possible
§

CQ5.6.1: share DS between instances and replicate without binary, offloading framework

Processing
Author

Processing
Author

workflow steps
running, replicating results back

offload CPU-intensive
WF steps
by replication
editing

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

A

workflows
running

13
High Processing Input Feed - Solution Pattern 2
§

Separate pre-processing instances for uploading
§

There can be more than 1 pre-processing instance
§

CQ5.6.1: share DS between instances and replicate without binary

workflows
running
image
upload

Processing
Author

replication

editing

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

A

14
High Processing Input Feed
§

Notes
§

Author cluster can help mitigate the problem, but editors must edit content on slave

§

Throttling WFs or execution during night can help mitigate the problem

§

If the import is limited by CPU needed image conversion consider using ImageMagick
rather than Java

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

15
High Volume Input Feed - Description
§

Use Case:
§

Product data import
§

§

1 million products, 10000 modifications/day

Limiting factor
§

Writing to the repository
§

§

reads are also blocked

Potentially (to a lesser degree) in case repository scans are needed to create diffs:
§

CPU for calculating diffs

§

Repository read caches get flushed

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

16
High Volume Input Feed - Solution Pattern
§

Separate import instance to process imports, partition if possible
§

§

only useful if import requires significant CPU (e.g. no diff delivered)

Replicate to author
§

Replicate as package
§

§

CQ5.6.1: share DS between instances and replicate without binary

Replication to publish as package if possible
import into
repository
Backend

import
process

Import
Processing
Author

replication of diff

editing

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

A

17

replication
of diff

P
High Volume Input Feed
§

Questions to ask
§

§

Can the import be throttled? Most problems get much less severe.

Notes
§

Use batch saves (1000 nodes) on import (reduces overhead in indexing, etc and
speeds up the import overall)

§

Import as nt:unstructured rather cq:Page if possible
§

§

If not: switch off heavy listeners (e.g. ContentSync) or use the JcrObservationThrottle

Anti-Pattern
§

Usage of network disc (usually have high latency)

§

Replicating to publish through same replication queue as editorial content
§

§

Might heavily delay important editorial changes from publication

Using PageManager (cannot operate in batch mode, unless you use 5.6.1 where that
option was added)

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

18
Many Editors - Description
§

Use Case:
§
§

§

News or media portal
>50 editors editing content concurrently

Limiting factor
§

Depends on what do the editors actually do, eg. MSM rollouts, starting WFs, but
usually CPU-bound

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

19
Many Editors - Solution Pattern 1
§

Sharding: split up different web sites / parts of web sites onto separate author
instances

§

Publish instances are shared

editing

A

replication

site 1

editing

A

P

site 2

editing

A

P

site 3

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

P

20
Many Editors - Solution Pattern 2
§

Sharding: split up different web sites into separate author instances, but replicate into
one main author, e.g. for shared workflow processes
§
§

§

Practical if the shards do not need to share content.
Cross-replication can be done, but will be hard to keep consistent

Publish instances are shared
A

editing

site 1

A

editing

replication

site 2

A

editing

site 3

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

21

A

P
Many Editors
§

Questions to ask
§

§

What do the editors actually do?

Notes
§

Author dispatcher helps to reduce CPU load on author instances

§

Author CRX cluster instead of sharding will mitigate the problem if CPU-bound

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

22
Geo-distributed Editors
§

Use Case:
§

§

Editors located in different geos (US, EMEA, APAC)

Limiting factor
§

Bandwidth between editor location and author server location

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

23
Geo-distributed Editors - Solution Pattern
§

Use Dispatcher in front of Author

§

Guiding principle: limit traffic between Dispatcher and editor location.
§

gzip traffic

§

Use Client Library Manager to minimize traffic
§

§

minify, concat and gzip all client libraries

Cache all responses that are not under /content in
§

Editor’s browser cache

§

Potentially also dispatcher cache

cache

cache
minimize requests
gzip responses

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Disp

24

A
Geo-distributed Editors
§

Notes
§

In extreme cases consider writing templates that treat author renditions differently from
publish renditions (especially reducing the number of necessary requests, e.g. by dropping
requests to tracking servers, external CSS, etc)

§

Or use Scaffolding for editing

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

25
Many DAM Assets
§

Use Case:
§

§

Many assets (>5Mio) in DAM

Limiting factor
§

Disc space

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

26
Many DAM Assets - Solution Pattern
§

Split physical storage of data store and repository tar files
§
§

for data store high latency is acceptable

§

§

tar files need disc with very low latency

Locate data store on cheap discs remotely (NAS, S3)

Share data store between instances
§

In 5.6.1: use binary-less replication in case of shared DS to minimize network traffic

Repository

tar

local
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

DS

S3
27
Many DAM Assets
§

Notes
§

In case of shared DS: the DS garbage collection needs to be run on an instance that keeps
references to all assets in DS

§

In 5.6.1: huge performance improvements (~10x or more) for DS GC when the persistence is
tar-based

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

28
Geo-distributed disaster recovery
§

Use Case:
§
§

§

Data centers located in different geos
One DC shall act as failover for author

Limiting factor
§

Latency between DCs (in very low latency cases CRX clustering could be used)

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

29
Geo-distributed disaster recovery - Solution Pattern
§

Use file level tools like rysnc to create replicas in 2nd DC

§

Hourly: sync data store
§
§

§

This is usually the most time consuming part
Sync can be performed anytime, due to add-only data store architecture

Nightly:
§

Create incremental backup into filesystem on 1st DC to get consistent state of files

§

Rsync backup to 2nd DC. For that period CQ on 2nd DC must not be running.

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

30
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Scaling AEM (CQ5) Gem Session

  • 1.
    Scaling AEM Michael Marth| Senior Engineering Manager © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 2.
    Scalability? § Performance vs Scalability § § § § § Performance:“it takes X secs to do Y” Scalability: “it takes X secs to do Y simultaneously Z times” Yet, performance can help with scalability This talk is about horizontal scalability (vertical scaling is trivial) The content is documented at http://dev.day.com/docs/en/cq/current/deploying/scaling.html © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
  • 3.
    Preliminary Notes § This talkis about pre-Oak scalability patterns, i.e. the CQ5.x line § § with AEM 6 these patterns will change CRX2 Cluster is mostly a technology for reliability, not for scalability § Slight exception is read scalability © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
  • 4.
    Scaling AEM 1. HighVolume and High Performance Delivery 2. High Frequency Input Feed 3. High Processing Input Feed 4. High Volume Input Feed 5. Many Editors 6. Geo-distributed Editors 7. Many DAM assets 8. Geo-distributed disaster recovery © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
  • 5.
    High Volume andHigh Performance Delivery - Description § Use Case: § High traffic site (100m impressions/d) § Examples: adobe.com § Limiting factor § CPU on publish © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
  • 6.
    High Volume andHigh Performance Delivery - Solution Pattern § Leverage dispatcher caching as much as possible § § SSI and/or client-side for personalized content § § in latest dispatcher: single-page dispatcher flush and scripted flushing, use to cache/flush content in dispatcher Selectors for query caching CDN with short TTL P Disp round robin Load Balancer P Disp © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6 CDN Client
  • 7.
    High Volume andHigh Performance Delivery § Notes § Related to rendering performance, see also § § § CQ performance patterns (use CQ timing component, prefer tree walking over JCR queries, use ClientLibraryManager to concat and minify JS, etc, see [1]) Generic performance patterns (reduce requests with e.g. css sprites, gzip responses, put JS calls at bottom of HTML, etc, see [2]) Anti-Pattern § Adding publishers before leveraging caching [1] http://dev.day.com/docs/en/cq/current/deploying/performance.html [2] http://shop.oreilly.com/product/9780596529307.do © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
  • 8.
    High Frequency InputFeed - Description § Use Case: news feed import (moderate amounts, but constant updates) § Limiting factor § Dispatcher cache invalidation § Therefore actual limiting factor is CPU on publish © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
  • 9.
    High Frequency InputFeed - Solution Pattern 1 § Set up content structure so that other pages do not get invalidated on dispatcher cache § § § if possible: highly volatile content e.g. in /etc with latest dispatcher: single-page flush possible Separate replication queue (so that main queue is not blocked) Backend import process editing A © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. replication P 9 flush Disp
  • 10.
    High Frequency InputFeed - Solution Pattern 2 § Set up content structure so that other pages do not get invalidated on dispatcher cache § § as previous pattern Import directly into Publish (no replication necessary) Backend Backend import process A © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. import process P flush Disp
  • 11.
    High Frequency InputFeed § Questions to ask § Human filtering/processing needed? Then imports should be on author and replicated. § If no: is the use case OK with different states on publish? § if yes: no replication needed, then pattern 2 is preferable © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
  • 12.
    High Processing InputFeed - Description § Use Case: § DAM import of images § § happens regularly § § 1000 images at once other editors are editing content at the same time Limiting factor § CPU, memory on author © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
  • 13.
    High Processing InputFeed - Solution Pattern 1 § Separate processing instances from human editing instances § Offload 1 Workflow step, e.g. thumbnail generation from PSDs § There can be more than 1 processing instance § Replicate back and forth in packages if possible § CQ5.6.1: share DS between instances and replicate without binary, offloading framework Processing Author Processing Author workflow steps running, replicating results back offload CPU-intensive WF steps by replication editing © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. A workflows running 13
  • 14.
    High Processing InputFeed - Solution Pattern 2 § Separate pre-processing instances for uploading § There can be more than 1 pre-processing instance § CQ5.6.1: share DS between instances and replicate without binary workflows running image upload Processing Author replication editing © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. A 14
  • 15.
    High Processing InputFeed § Notes § Author cluster can help mitigate the problem, but editors must edit content on slave § Throttling WFs or execution during night can help mitigate the problem § If the import is limited by CPU needed image conversion consider using ImageMagick rather than Java © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
  • 16.
    High Volume InputFeed - Description § Use Case: § Product data import § § 1 million products, 10000 modifications/day Limiting factor § Writing to the repository § § reads are also blocked Potentially (to a lesser degree) in case repository scans are needed to create diffs: § CPU for calculating diffs § Repository read caches get flushed © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
  • 17.
    High Volume InputFeed - Solution Pattern § Separate import instance to process imports, partition if possible § § only useful if import requires significant CPU (e.g. no diff delivered) Replicate to author § Replicate as package § § CQ5.6.1: share DS between instances and replicate without binary Replication to publish as package if possible import into repository Backend import process Import Processing Author replication of diff editing © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. A 17 replication of diff P
  • 18.
    High Volume InputFeed § Questions to ask § § Can the import be throttled? Most problems get much less severe. Notes § Use batch saves (1000 nodes) on import (reduces overhead in indexing, etc and speeds up the import overall) § Import as nt:unstructured rather cq:Page if possible § § If not: switch off heavy listeners (e.g. ContentSync) or use the JcrObservationThrottle Anti-Pattern § Usage of network disc (usually have high latency) § Replicating to publish through same replication queue as editorial content § § Might heavily delay important editorial changes from publication Using PageManager (cannot operate in batch mode, unless you use 5.6.1 where that option was added) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
  • 19.
    Many Editors -Description § Use Case: § § § News or media portal >50 editors editing content concurrently Limiting factor § Depends on what do the editors actually do, eg. MSM rollouts, starting WFs, but usually CPU-bound © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
  • 20.
    Many Editors -Solution Pattern 1 § Sharding: split up different web sites / parts of web sites onto separate author instances § Publish instances are shared editing A replication site 1 editing A P site 2 editing A P site 3 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. P 20
  • 21.
    Many Editors -Solution Pattern 2 § Sharding: split up different web sites into separate author instances, but replicate into one main author, e.g. for shared workflow processes § § § Practical if the shards do not need to share content. Cross-replication can be done, but will be hard to keep consistent Publish instances are shared A editing site 1 A editing replication site 2 A editing site 3 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21 A P
  • 22.
    Many Editors § Questions toask § § What do the editors actually do? Notes § Author dispatcher helps to reduce CPU load on author instances § Author CRX cluster instead of sharding will mitigate the problem if CPU-bound © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
  • 23.
    Geo-distributed Editors § Use Case: § § Editorslocated in different geos (US, EMEA, APAC) Limiting factor § Bandwidth between editor location and author server location © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
  • 24.
    Geo-distributed Editors -Solution Pattern § Use Dispatcher in front of Author § Guiding principle: limit traffic between Dispatcher and editor location. § gzip traffic § Use Client Library Manager to minimize traffic § § minify, concat and gzip all client libraries Cache all responses that are not under /content in § Editor’s browser cache § Potentially also dispatcher cache cache cache minimize requests gzip responses © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. Disp 24 A
  • 25.
    Geo-distributed Editors § Notes § In extremecases consider writing templates that treat author renditions differently from publish renditions (especially reducing the number of necessary requests, e.g. by dropping requests to tracking servers, external CSS, etc) § Or use Scaffolding for editing © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
  • 26.
    Many DAM Assets § UseCase: § § Many assets (>5Mio) in DAM Limiting factor § Disc space © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
  • 27.
    Many DAM Assets- Solution Pattern § Split physical storage of data store and repository tar files § § for data store high latency is acceptable § § tar files need disc with very low latency Locate data store on cheap discs remotely (NAS, S3) Share data store between instances § In 5.6.1: use binary-less replication in case of shared DS to minimize network traffic Repository tar local © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. DS S3 27
  • 28.
    Many DAM Assets § Notes § Incase of shared DS: the DS garbage collection needs to be run on an instance that keeps references to all assets in DS § In 5.6.1: huge performance improvements (~10x or more) for DS GC when the persistence is tar-based © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
  • 29.
    Geo-distributed disaster recovery § UseCase: § § § Data centers located in different geos One DC shall act as failover for author Limiting factor § Latency between DCs (in very low latency cases CRX clustering could be used) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
  • 30.
    Geo-distributed disaster recovery- Solution Pattern § Use file level tools like rysnc to create replicas in 2nd DC § Hourly: sync data store § § § This is usually the most time consuming part Sync can be performed anytime, due to add-only data store architecture Nightly: § Create incremental backup into filesystem on 1st DC to get consistent state of files § Rsync backup to 2nd DC. For that period CQ on 2nd DC must not be running. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
  • 31.
    © 2012 AdobeSystems Incorporated. All Rights Reserved. Adobe Confidential.