CRX2Oak - all the secrets of repository migration

#evolverocks
CRX2OAK – ALL THE SECRETS OF
REPOSITORY MIGRATION
TOMEK RĘKAWEK, ADOBE RESEARCH
Aug 30, 2016

#evolverocks 2
• Overview of CRX2Oak
• CRX2Oak command line
• Features
• Case study: large migration
• General migration tips
• Using CRX2Oak for AEM upgrade
• Q & (hopefully) A
AGENDA

#evolverocks 3
OVERVIEW OF THE CRX2OAK
UPGRADE FROM CRX2
CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak

#evolverocks 4
UPGRADE OR SIDEGRADE
CQ 5.x – CRX2
AEM 6.x – Jackrabbit Oak
AEM 6.x – Oak

#evolverocks 5
MIGRATING BINARIES

#evolverocks 6
• CRX2Oak is a command-line tool:
• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET
• Source and target defines the repositories. Supported formats:
• path to the CRX2 “repository” directory, eg.
crx-quickstart/repository
• path to the Oak SegmentMK “repository” directory, as above
• Mongo URI, eg.
mongodb://localhost:27017/aem
• JDBC URI, eg.
jdbc:mysql://localhost:3306/sakila?profileSQL=true
CRX2OAK COMMAND LINE
REPOSITORY PARAMETER TYPES

#evolverocks 7
• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET
• The source blob store is defined using: --src-datastore or --src-s3datastore.
• If there’s no blob store defined for source, CRX2Oak assumes embedded
• If the source blob store is defined, it will be used for target as well (only
references will be copied, not actual binaries)
• It can be overridden with --copy-binaries
• Destination blob store can be defined with: --datastore or --s3datastore
CRX2OAK COMMAND LINE
DEFINING DATASTORE TO BE USED

#evolverocks 8
FEATURES
SELECTING PATHS TO MIGRATE

#evolverocks 9
FEATURES
MIGRATING VERSION STORAGE

#evolverocks 10
• Client requirements
• CQ 5.6.1 instance with a large number of sites and assets, storing binaries in S3
• The content is being authored 24/7
• The migration of the whole content takes about 20h
• The migration is being done offline and the instance can’t be down so long
• The upgraded instance has to be tested before going live
• Strategy
• Snapshot the instance and migrate the copy
• Perform tests on it
• Top-up the changes introduced after snapshot
CASE STUDY
INTRODUCTION

#evolverocks 11
CASE STUDY
STRATEGY

#evolverocks 12
• The migration (4) will be much faster, as only the diff will be migrated
• In the (4) use --skip-init, so the existing repository won’t be reinitialized
• Also, use --include-paths=/content/mysite to migrate only the modified
subtree
CASE STUDY
REMARKS

#evolverocks 13
• When using Mongo (either as source or destination), run CRX2Oak on the same
machine as Mongo primary
• If you don’t need version history for deleted nodes, use --copy-orphaned-
versions=false to make the migration faster
• CRX2Oak may be used to copy content between existing repositories. Use
following parameters:
• --skip-init, so the destination is not initialized with the index definitions,
• --{include,merge}-paths to refer which subtrees should be copied
• --copy-orphaned-versions=false
GENERAL MIGRATION TIPS

#evolverocks 14
• When upgrading CQ 5.x + S3, crx2oak calls AWS asking for length of each binary
• the lengths are stored in Oak but not in CRX2, so we have to ask about it
• For a large repositories it may slow down the whole migration
• It’s possible to pre-fetch all lengths, store them in a text file and configure CRX
(and therefore CRX2Oak) to use it
• More information:
• https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/upgrade
/blob/LengthCachingDataStore.html
• Sample configuration files:
• http://bit.ly/cq5-s3-upgrade
GENERAL MIGRATION TIPS
UPGRADING CQ 5.X STORING BINARIES IN AWS S3

#evolverocks 15
• UUID conflict exception
• may occur if the destination repository already exists (iterative migration)
• remember to add --copy-orphaned-versions=false
• when using --include-paths, include all modified paths:
• otherwise, if the page has been moved and we include only the destination path,
CRX2Oak won’t remove the page from its original position
• BlobId not found exception
• either source or destination blob store is not configured correctly
• Unable to delete referenced node
• probably CRX2Oak tries to overwrite the whole version storage (removing existing
versions)
• add --copy-orphaned-versions=false
TROUBLESHOOTING

#evolverocks 16
Official docs describes using the extension:
• java -jar aem-quickstart-6.2.0.jar -unpack # unpack the AEM jar
• java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare extension
config
• java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare OSGi config
• java -Xmx4096m -XX:MaxPermSize=2048M -jar aem-quickstart-6.2.0.jar -v -
x crx2oak -xargs -- -o migrate
For running the CRX2Oak manually, the last command should be replaced with:
• java -Xmx4096m -XX:MaxPermSize=2048M -jar crx-
quickstart/opt/helpers/crx2oak/crx2oak.jar [source] [destination]
USING EXTENSION VS RUNNING CRX2OAK
MANUALLY

#evolverocks 17
• All CRX2Oak versions offer similar features
• They differ in:
• Oak version used underneath (as the CRX2Oak starts a normal Oak repository)
• Index definitions created during the repository initialisation
• These both things are assigned to the AEM version and shouldn’t be mismatched
• Table of truth:
• CRX2Oak 1.2.x can be used with AEM 6.1 too, but it won’t have all the
advanced features
VERSIONS
AEM Oak CRX2Oak
AEM 6.0 1.0.x 1.0.x
AEM 6.1 1.2.x 1.3.x (sic!)
AEM 6.2 1.4.x 1.4.x

#evolverocks 18
• CRX2Oak downloads:
• https://repo.adobe.com/nexus/content/groups/public/com/adobe/granite/crx2oak/
• CRX2Oak documentation
• https://docs.adobe.com/docs/en/aem/6-2/deploy/upgrade/using-crx2oak.html
• oak-upgrade documentation:
• https://jackrabbit.apache.org/oak/docs/migration.html
RESOURCES

#evolverocks
THANK YOU!
http://tomek.rekawek.eu
@Tomek1024
rekawek@adobe.com

CRX2Oak - all the secrets of repository migration

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CRX2Oak - all the secrets of repository migration

Similar to CRX2Oak - all the secrets of repository migration (20)

More from Tomasz Rękawek

More from Tomasz Rękawek (9)

Recently uploaded

Recently uploaded (20)

CRX2Oak - all the secrets of repository migration