Slides presented at the Circuit14 conference in Chicago 6/4/14. Topic was the replication framework of Adobe Experience Manager (AEM) and how it can get customized to address various use cases.
Demonstrated sample code is accessible at GitHub: https://github.com/mwmd/circuit14-aem-replication
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Adobe Experience Manager - Replication deep dive
1. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
AEM Replication
deep dive
Matthias Wermund
Acquity Group
part of Accenture Interactive
2. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Replication overview
• AEM replication is used for (un)publishing
of AEM content
• Most interaction between AEM Author and
AEM Publish is part of replication
3. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Key replication components
• Agent
– One per receiving end
• Queue
– Tracks replication requests
– One per agent
• ContentBuilder
– Transforms replicated content to payload
4. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Types of replication
• Activate / Deactivate
– Durbo
– Flush
– Binary-less
– Static
• Reverse replication
5. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Dispatcher cache flush
• HTTP request to web server module
• Effect dependent on dispatcher
configuration
– Deletion of activated content
– Invalidation of related content (statfileslevel)
• On Author vs. On Publish
– Race condition of replication and invalidation
– On Publish safer, but not always possible
6. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Ways to trigger replication
• Page authoring
• Tree activation
• Workflow
• Custom code (API)
7. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Scenario: Publication preview
• Before going live to end users, preview of
change in internal AEM Publish is required
• Solution approach:
– Integration in publishing workflow
– Activation to only selected AEM Publish
– Review & approve before standard activation
8. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Publishing workflow
9. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Custom workflow process step
11
22
33
10. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Scenario: Global replication notification
• All replication invocations need to trigger
an external search index update
• Solution approach:
– Implement OSGi listener for replication events
– Get replication metadata from event
– Trigger indexing based on replication type
11. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Replication event listener
11
22
12. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Scenario: Guaranteed replication
• Before Author users are asked to review,
the activation to Preview Publish must
have been successfully completed
• Solution approach:
– Use synchronous replication
– Verify success via replication listener
13. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Advanced publishing workflow
14. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Synchronous replication with listener
11
22
15. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Scenario: Data filtering
• For privacy, author user information must
get removed from AEM Publish content
• Solution approach:
– Create custom content filter implementation
– Filter out undesired properties
16. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Custom ReplicationContentFilter
11
22
17. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
ReplicationContentFilterFactory
18. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Scenario: Custom replication payload
• When publishing, content must get
exported to a 3rd
party system in JSON
• Solution approach:
– Create a custom ContentBuilder
– Invoke standard JSON renderer for page
– Send HTTP POST using replication agent
19. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
ContentBuilder implementation (1/2)
11
22
20. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
ContentBuilder implementation (2/2)
33
44
21. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
JSON replication agent
22. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Scenario: Content partitioning
• Different sites must get replicated only to
dedicated AEM Publish instances
• Solution approach:
– Create system user account per site
– Configure replication agent with user account
– Configure ACL to READ for only one site
23. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Configure agent with user account
24. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Setup ACL for single site
25. CIRCUIT – An Adobe Developer Event
Presented by CITYTECH, Inc.
Thanks
• Questions?
matthias.c.wermund@accenture.com
github.com/mwmd
Editor's Notes
- Personal intro
In this session I’ll speak about the replication framework in AEM, and go into some non-standard use cases with possible solutions for them.
Question 1: Who knows what AEM replication is and how to use it?
Question 2: Who has implemented a custom ReplicationContentFilter?
((Last question: Who has implemented a custom ContentBuilder for replication?))
- Almost everyone is using replication because by nature of AEM if you use an Author and a Publish instance, you’ll use replication.
I always think of replication framework as a somewhat “hidden gem”: It’s much more powerful than only supporting the standard use cases
Agent: Main configuration object; typically each Agent triggers one HTTP request
Lots of options: Transport details, Triggers, Versioning
Queue: Persisted, survives instance restart
ContentBuilder
Activate/Deactivate takes care of controlling Publish visibility of content
Durbo: Default replication mechanism, packaging content for transport
Flush: Dispatcher cache flushing, more on that in a bit
Binary-less: More efficient transport for large objects like Assets. Assumes the data store is shared between Author and Publish instances.
Static: Writes replicated content into file system, for example for archival
Reverse replication: Transport of user generated content from Publish to Author, deserves its own session
Dispatcher.any configuration file
Authoring:
Sidekick and SiteAdmin
Implicit, example: Activated page move
Tree Activation: Bulk activation tool
Workflow: Out of the box steps, customizations (will show example)
Custom code:
Replication API provides extensible framework
In the following, several use case scenarios will demonstrate how to leverage the available API
Reasons for Publish Preview:
Give non-author users ability to see content
External system integrations sometimes not feasible in Author (infrastructure etc.)
Inserted process step using custom implementation
Custom step will activate to only the Preview Publish instance
Users can review the content in Preview Publish
Manual approval will trigger standard activation to all AEM Publishs
1: ReplicationOptions allows per-replication configuration of agent, overriding agent config
2: AgentIdFilter: Will ignore all agents except the one specified.
3: Replicator: Central service to trigger replication actions. Needs a valid JCR session to execute against.
At this point, replication item got added to the Agent’s queue.
For example delta updates to search index inbetween scheduled major full index rebuilds
Approach is very broadly applicable to any kind of replication triggered operation.
But Listener will get called for all Replication events, no filtering for content path or other possible.
1: OSGi EventHandler implementation, subscribing to Replication events
2: Convert generic Event data to Replication metadata using Replication API
Then react based on Replication metadata, here updating or dropping items in the search index.
“Guaranteed” in sense of: The result is known when proceeding after call of service
The default replication method is asynchronous and using the agent’s queue.
Synchronous bypasses the queue and executes the replication immediately.
Use of ReplicationListener is necessary to get result of replication action.
Added OR split, based on result of preview activation
1: Make replication synchronous. Default is asynchronous. Specify listener to be used during replication process.
2: Verify replication success and put result in workflow metadata so other processes can access the result.
1: Implements interface of Replication API. Isn’t an OSGi component because of Factory pattern.
2: “Accepts” method called for each property on every node during replication
- Interface also supports filtering of full nodes.
- OSGi factory to make the custom ContentFilter known to the container.
- One can filter the for specific activation actions, for example only type ACTIVATE.
Example: Content syndication to J2EE application stack
Custom ConentBuilder: Triggers the transformation to JSON
JSON renderer: Same out of the box functionality when using JSON extension
HTTP POST is simple configuration of a new replication agent, similar to standard replication agents
1: OSGi service declaration, implementing ContentBuilder interface
2: Create method handles transformation of node structure to replication payload
- ReplicationContent.VOID if no payload required (example: Dispatcher Flush)
3: Leveraging standard JSON renderer to generate JSON payload
4: ReplicationContentFactory will store the payload
Custom builder will be listed because of being an OSGi service for ContentBuilder interface
The Serialization will generate the request payload. Everything else of the agent remains the normal functionality (HTTP request etc.)
Typical multi-tenant scenario. One client, shared author for reuse of content.Separation of Publish instances could be for infrastructure reasons, or different security requirements.
!! Works for Dispatcher Flush agents too (separating Dispatcher cache control per site) !!
Default system user has access to everything
Custom user configured for this agent. All replication actions for this agent will impersonate to this user.
User account specific for standard geometrixx site
No access other than READ on the /content/geometrixx tree
Log output when attempting a replication for another site (geometrixx outdoors) using this replication agent