Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
#evolverocks
CRX2OAK – ALL THE SECRETS OF
REPOSITORY MIGRATION
TOMEK RĘKAWEK, ADOBE RESEARCH
Aug 30, 2016
#evolverocks 2
• Overview of CRX2Oak
• CRX2Oak command line
• Features
• Case study: large migration
• General migration t...
#evolverocks 3
OVERVIEW OF THE CRX2OAK
UPGRADE FROM CRX2
CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak
#evolverocks 4
OVERVIEW OF THE CRX2OAK
UPGRADE OR SIDEGRADE
CQ 5.x – CRX2
AEM 6.x – Jackrabbit Oak
AEM 6.x – Oak
#evolverocks 5
OVERVIEW OF THE CRX2OAK
MIGRATING BINARIES
#evolverocks 6
• CRX2Oak is a command-line tool:
• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET
• Sou...
#evolverocks 7
• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET
• The source blob store is defined usin...
#evolverocks 8
FEATURES
SELECTING PATHS TO MIGRATE
#evolverocks 9
FEATURES
MIGRATING VERSION STORAGE
#evolverocks 10
• Client requirements
• CQ 5.6.1 instance with a large number of sites and assets, storing binaries in S3
...
#evolverocks 11
CASE STUDY
STRATEGY
#evolverocks 12
• The migration (4) will be much faster, as only the diff will be migrated
• In the (4) use --skip-init, s...
#evolverocks 13
• When using Mongo (either as source or destination), run CRX2Oak on the same
machine as Mongo primary
• I...
#evolverocks 14
• When upgrading CQ 5.x + S3, crx2oak calls AWS asking for length of each binary
• the lengths are stored ...
#evolverocks 15
• UUID conflict exception
• may occur if the destination repository already exists (iterative migration)
•...
#evolverocks 16
Official docs describes using the extension:
• java -jar aem-quickstart-6.2.0.jar -unpack # unpack the AEM...
#evolverocks 17
• All CRX2Oak versions offer similar features
• They differ in:
• Oak version used underneath (as the CRX2...
#evolverocks 18
• CRX2Oak downloads:
• https://repo.adobe.com/nexus/content/groups/public/com/adobe/granite/crx2oak/
• CRX...
#evolverocks
THANK YOU!
http://tomek.rekawek.eu
@Tomek1024
rekawek@adobe.com
Upcoming SlideShare
Loading in …5
×

CRX2Oak - all the secrets of repository migration

3,887 views

Published on

CRX2Oak is an official migration tool that allows to migrate data between different repository types. The most common use-case is upgrading an old, CQ 5.x repository to the AEM 6.x format. This session will cover the basic CRX2Oak usage, describe it's more advanced options and also share some real-world cases of large-scale (hundreds of GBs) data migration cases.

Published in: Software

CRX2Oak - all the secrets of repository migration

  1. 1. #evolverocks CRX2OAK – ALL THE SECRETS OF REPOSITORY MIGRATION TOMEK RĘKAWEK, ADOBE RESEARCH Aug 30, 2016
  2. 2. #evolverocks 2 • Overview of CRX2Oak • CRX2Oak command line • Features • Case study: large migration • General migration tips • Using CRX2Oak for AEM upgrade • Q & (hopefully) A AGENDA
  3. 3. #evolverocks 3 OVERVIEW OF THE CRX2OAK UPGRADE FROM CRX2 CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak
  4. 4. #evolverocks 4 OVERVIEW OF THE CRX2OAK UPGRADE OR SIDEGRADE CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak AEM 6.x – Oak
  5. 5. #evolverocks 5 OVERVIEW OF THE CRX2OAK MIGRATING BINARIES
  6. 6. #evolverocks 6 • CRX2Oak is a command-line tool: • java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET • Source and target defines the repositories. Supported formats: • path to the CRX2 “repository” directory, eg. crx-quickstart/repository • path to the Oak SegmentMK “repository” directory, as above • Mongo URI, eg. mongodb://localhost:27017/aem • JDBC URI, eg. jdbc:mysql://localhost:3306/sakila?profileSQL=true CRX2OAK COMMAND LINE REPOSITORY PARAMETER TYPES
  7. 7. #evolverocks 7 • java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET • The source blob store is defined using: --src-datastore or --src-s3datastore. • If there’s no blob store defined for source, CRX2Oak assumes embedded • If the source blob store is defined, it will be used for target as well (only references will be copied, not actual binaries) • It can be overridden with --copy-binaries • Destination blob store can be defined with: --datastore or --s3datastore CRX2OAK COMMAND LINE DEFINING DATASTORE TO BE USED
  8. 8. #evolverocks 8 FEATURES SELECTING PATHS TO MIGRATE
  9. 9. #evolverocks 9 FEATURES MIGRATING VERSION STORAGE
  10. 10. #evolverocks 10 • Client requirements • CQ 5.6.1 instance with a large number of sites and assets, storing binaries in S3 • The content is being authored 24/7 • The migration of the whole content takes about 20h • The migration is being done offline and the instance can’t be down so long • The upgraded instance has to be tested before going live • Strategy • Snapshot the instance and migrate the copy • Perform tests on it • Top-up the changes introduced after snapshot CASE STUDY INTRODUCTION
  11. 11. #evolverocks 11 CASE STUDY STRATEGY
  12. 12. #evolverocks 12 • The migration (4) will be much faster, as only the diff will be migrated • In the (4) use --skip-init, so the existing repository won’t be reinitialized • Also, use --include-paths=/content/mysite to migrate only the modified subtree CASE STUDY REMARKS
  13. 13. #evolverocks 13 • When using Mongo (either as source or destination), run CRX2Oak on the same machine as Mongo primary • If you don’t need version history for deleted nodes, use --copy-orphaned- versions=false to make the migration faster • CRX2Oak may be used to copy content between existing repositories. Use following parameters: • --skip-init, so the destination is not initialized with the index definitions, • --{include,merge}-paths to refer which subtrees should be copied • --copy-orphaned-versions=false GENERAL MIGRATION TIPS
  14. 14. #evolverocks 14 • When upgrading CQ 5.x + S3, crx2oak calls AWS asking for length of each binary • the lengths are stored in Oak but not in CRX2, so we have to ask about it • For a large repositories it may slow down the whole migration • It’s possible to pre-fetch all lengths, store them in a text file and configure CRX (and therefore CRX2Oak) to use it • More information: • https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/upgrade /blob/LengthCachingDataStore.html • Sample configuration files: • http://bit.ly/cq5-s3-upgrade GENERAL MIGRATION TIPS UPGRADING CQ 5.X STORING BINARIES IN AWS S3
  15. 15. #evolverocks 15 • UUID conflict exception • may occur if the destination repository already exists (iterative migration) • remember to add --copy-orphaned-versions=false • when using --include-paths, include all modified paths: • otherwise, if the page has been moved and we include only the destination path, CRX2Oak won’t remove the page from its original position • BlobId not found exception • either source or destination blob store is not configured correctly • Unable to delete referenced node • probably CRX2Oak tries to overwrite the whole version storage (removing existing versions) • add --copy-orphaned-versions=false TROUBLESHOOTING
  16. 16. #evolverocks 16 Official docs describes using the extension: • java -jar aem-quickstart-6.2.0.jar -unpack # unpack the AEM jar • java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare extension config • java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare OSGi config • java -Xmx4096m -XX:MaxPermSize=2048M -jar aem-quickstart-6.2.0.jar -v - x crx2oak -xargs -- -o migrate For running the CRX2Oak manually, the last command should be replaced with: • java -Xmx4096m -XX:MaxPermSize=2048M -jar crx- quickstart/opt/helpers/crx2oak/crx2oak.jar [source] [destination] USING EXTENSION VS RUNNING CRX2OAK MANUALLY
  17. 17. #evolverocks 17 • All CRX2Oak versions offer similar features • They differ in: • Oak version used underneath (as the CRX2Oak starts a normal Oak repository) • Index definitions created during the repository initialisation • These both things are assigned to the AEM version and shouldn’t be mismatched • Table of truth: • CRX2Oak 1.2.x can be used with AEM 6.1 too, but it won’t have all the advanced features VERSIONS AEM Oak CRX2Oak AEM 6.0 1.0.x 1.0.x AEM 6.1 1.2.x 1.3.x (sic!) AEM 6.2 1.4.x 1.4.x
  18. 18. #evolverocks 18 • CRX2Oak downloads: • https://repo.adobe.com/nexus/content/groups/public/com/adobe/granite/crx2oak/ • CRX2Oak documentation • https://docs.adobe.com/docs/en/aem/6-2/deploy/upgrade/using-crx2oak.html • oak-upgrade documentation: • https://jackrabbit.apache.org/oak/docs/migration.html RESOURCES
  19. 19. #evolverocks THANK YOU! http://tomek.rekawek.eu @Tomek1024 rekawek@adobe.com

×