SlideShare a Scribd company logo
#evolverocks
CRX2OAK – ALL THE SECRETS OF
REPOSITORY MIGRATION
TOMEK RĘKAWEK, ADOBE RESEARCH
Aug 30, 2016
#evolverocks 2
• Overview of CRX2Oak
• CRX2Oak command line
• Features
• Case study: large migration
• General migration tips
• Using CRX2Oak for AEM upgrade
• Q & (hopefully) A
AGENDA
#evolverocks 3
OVERVIEW OF THE CRX2OAK
UPGRADE FROM CRX2
CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak
#evolverocks 4
OVERVIEW OF THE CRX2OAK
UPGRADE OR SIDEGRADE
CQ 5.x – CRX2
AEM 6.x – Jackrabbit Oak
AEM 6.x – Oak
#evolverocks 5
OVERVIEW OF THE CRX2OAK
MIGRATING BINARIES
#evolverocks 6
• CRX2Oak is a command-line tool:
• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET
• Source and target defines the repositories. Supported formats:
• path to the CRX2 “repository” directory, eg.
crx-quickstart/repository
• path to the Oak SegmentMK “repository” directory, as above
• Mongo URI, eg.
mongodb://localhost:27017/aem
• JDBC URI, eg.
jdbc:mysql://localhost:3306/sakila?profileSQL=true
CRX2OAK COMMAND LINE
REPOSITORY PARAMETER TYPES
#evolverocks 7
• java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET
• The source blob store is defined using: --src-datastore or --src-s3datastore.
• If there’s no blob store defined for source, CRX2Oak assumes embedded
• If the source blob store is defined, it will be used for target as well (only
references will be copied, not actual binaries)
• It can be overridden with --copy-binaries
• Destination blob store can be defined with: --datastore or --s3datastore
CRX2OAK COMMAND LINE
DEFINING DATASTORE TO BE USED
#evolverocks 8
FEATURES
SELECTING PATHS TO MIGRATE
#evolverocks 9
FEATURES
MIGRATING VERSION STORAGE
#evolverocks 10
• Client requirements
• CQ 5.6.1 instance with a large number of sites and assets, storing binaries in S3
• The content is being authored 24/7
• The migration of the whole content takes about 20h
• The migration is being done offline and the instance can’t be down so long
• The upgraded instance has to be tested before going live
• Strategy
• Snapshot the instance and migrate the copy
• Perform tests on it
• Top-up the changes introduced after snapshot
CASE STUDY
INTRODUCTION
#evolverocks 11
CASE STUDY
STRATEGY
#evolverocks 12
• The migration (4) will be much faster, as only the diff will be migrated
• In the (4) use --skip-init, so the existing repository won’t be reinitialized
• Also, use --include-paths=/content/mysite to migrate only the modified
subtree
CASE STUDY
REMARKS
#evolverocks 13
• When using Mongo (either as source or destination), run CRX2Oak on the same
machine as Mongo primary
• If you don’t need version history for deleted nodes, use --copy-orphaned-
versions=false to make the migration faster
• CRX2Oak may be used to copy content between existing repositories. Use
following parameters:
• --skip-init, so the destination is not initialized with the index definitions,
• --{include,merge}-paths to refer which subtrees should be copied
• --copy-orphaned-versions=false
GENERAL MIGRATION TIPS
#evolverocks 14
• When upgrading CQ 5.x + S3, crx2oak calls AWS asking for length of each binary
• the lengths are stored in Oak but not in CRX2, so we have to ask about it
• For a large repositories it may slow down the whole migration
• It’s possible to pre-fetch all lengths, store them in a text file and configure CRX
(and therefore CRX2Oak) to use it
• More information:
• https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/upgrade
/blob/LengthCachingDataStore.html
• Sample configuration files:
• http://bit.ly/cq5-s3-upgrade
GENERAL MIGRATION TIPS
UPGRADING CQ 5.X STORING BINARIES IN AWS S3
#evolverocks 15
• UUID conflict exception
• may occur if the destination repository already exists (iterative migration)
• remember to add --copy-orphaned-versions=false
• when using --include-paths, include all modified paths:
• otherwise, if the page has been moved and we include only the destination path,
CRX2Oak won’t remove the page from its original position
• BlobId not found exception
• either source or destination blob store is not configured correctly
• Unable to delete referenced node
• probably CRX2Oak tries to overwrite the whole version storage (removing existing
versions)
• add --copy-orphaned-versions=false
TROUBLESHOOTING
#evolverocks 16
Official docs describes using the extension:
• java -jar aem-quickstart-6.2.0.jar -unpack # unpack the AEM jar
• java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare extension
config
• java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare OSGi config
• java -Xmx4096m -XX:MaxPermSize=2048M -jar aem-quickstart-6.2.0.jar -v -
x crx2oak -xargs -- -o migrate
For running the CRX2Oak manually, the last command should be replaced with:
• java -Xmx4096m -XX:MaxPermSize=2048M -jar crx-
quickstart/opt/helpers/crx2oak/crx2oak.jar [source] [destination]
USING EXTENSION VS RUNNING CRX2OAK
MANUALLY
#evolverocks 17
• All CRX2Oak versions offer similar features
• They differ in:
• Oak version used underneath (as the CRX2Oak starts a normal Oak repository)
• Index definitions created during the repository initialisation
• These both things are assigned to the AEM version and shouldn’t be mismatched
• Table of truth:
• CRX2Oak 1.2.x can be used with AEM 6.1 too, but it won’t have all the
advanced features
VERSIONS
AEM Oak CRX2Oak
AEM 6.0 1.0.x 1.0.x
AEM 6.1 1.2.x 1.3.x (sic!)
AEM 6.2 1.4.x 1.4.x
#evolverocks 18
• CRX2Oak downloads:
• https://repo.adobe.com/nexus/content/groups/public/com/adobe/granite/crx2oak/
• CRX2Oak documentation
• https://docs.adobe.com/docs/en/aem/6-2/deploy/upgrade/using-crx2oak.html
• oak-upgrade documentation:
• https://jackrabbit.apache.org/oak/docs/migration.html
RESOURCES
#evolverocks
THANK YOU!
http://tomek.rekawek.eu
@Tomek1024
rekawek@adobe.com

More Related Content

What's hot

Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
Databricks
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
Databricks
 
해외 사례로 보는 Billing for OpenStack Solution
해외 사례로 보는 Billing for OpenStack Solution해외 사례로 보는 Billing for OpenStack Solution
해외 사례로 보는 Billing for OpenStack Solution
Nalee Jang
 
Control your service resources with systemd
 Control your service resources with systemd  Control your service resources with systemd
Control your service resources with systemd
Marian Marinov
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
Ceph Community
 
Elasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライドElasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライド
崇介 藤井
 
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
NTT DATA Technology & Innovation
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Community
 
H2O - the optimized HTTP server
H2O - the optimized HTTP serverH2O - the optimized HTTP server
H2O - the optimized HTTP server
Kazuho Oku
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
PingCAP
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
Sage Weil
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
Webpack DevTalk
Webpack DevTalkWebpack DevTalk
Webpack DevTalk
Alessandro Bellini
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
Taro L. Saito
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例
Amazon Web Services Japan
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
Mateusz Buśkiewicz
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
Cloudera Japan
 
普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見
zaru sakuraba
 

What's hot (20)

Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
 
해외 사례로 보는 Billing for OpenStack Solution
해외 사례로 보는 Billing for OpenStack Solution해외 사례로 보는 Billing for OpenStack Solution
해외 사례로 보는 Billing for OpenStack Solution
 
Control your service resources with systemd
 Control your service resources with systemd  Control your service resources with systemd
Control your service resources with systemd
 
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCERCEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
CEPH DAY BERLIN - MASTERING CEPH OPERATIONS: UPMAP AND THE MGR BALANCER
 
Elasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライドElasticsearchを使うときの注意点 公開用スライド
Elasticsearchを使うときの注意点 公開用スライド
 
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
H2O - the optimized HTTP server
H2O - the optimized HTTP serverH2O - the optimized HTTP server
H2O - the optimized HTTP server
 
TiDB for Big Data
TiDB for Big DataTiDB for Big Data
TiDB for Big Data
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Webpack DevTalk
Webpack DevTalkWebpack DevTalk
Webpack DevTalk
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例NetflixにおけるPresto/Spark活用事例
NetflixにおけるPresto/Spark活用事例
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
Hadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイントHadoopのシステム設計・運用のポイント
Hadoopのシステム設計・運用のポイント
 
普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見
 

Similar to CRX2Oak - all the secrets of repository migration

Postgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh ShahPostgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh Shah
PivotalOpenSourceHub
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux Containers
Jignesh Shah
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
Evan Chan
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
Spark Summit
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
Mahmoud Hatem
 
les01.pdf
les01.pdfles01.pdf
les01.pdf
VAMSICHOWDARY61
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
kanedafromparis
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture PerformanceEnkitec
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Bobby Curtis
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Docking postgres
Docking postgresDocking postgres
Docking postgres
rycamor
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceEnkitec
 
Simon Bennetts - Automating ZAP
Simon Bennetts - Automating ZAP Simon Bennetts - Automating ZAP
Simon Bennetts - Automating ZAP
DevSecCon
 
Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk
Simon Bennetts
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
Qubole
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
Dinakar Guniguntala
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Tim Callaghan
 
Training Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & RecoveryTraining Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & Recovery
Continuent
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
Tim Callaghan
 

Similar to CRX2Oak - all the secrets of repository migration (20)

Postgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh ShahPostgre sql linuxcontainers by Jignesh Shah
Postgre sql linuxcontainers by Jignesh Shah
 
PostgreSQL and Linux Containers
PostgreSQL and Linux ContainersPostgreSQL and Linux Containers
PostgreSQL and Linux Containers
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]The power of linux advanced tracer [POUG18]
The power of linux advanced tracer [POUG18]
 
les01.pdf
les01.pdfles01.pdf
les01.pdf
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
OGG Architecture Performance
OGG Architecture PerformanceOGG Architecture Performance
OGG Architecture Performance
 
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
Oracle GoldenGate Presentation from OTN Virtual Technology Summit - 7/9/14 (PDF)
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Docking postgres
Docking postgresDocking postgres
Docking postgres
 
Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)Tanel Poder Oracle Scripts and Tools (2010)
Tanel Poder Oracle Scripts and Tools (2010)
 
Oracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture PerformanceOracle GoldenGate Architecture Performance
Oracle GoldenGate Architecture Performance
 
Simon Bennetts - Automating ZAP
Simon Bennetts - Automating ZAP Simon Bennetts - Automating ZAP
Simon Bennetts - Automating ZAP
 
Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk Automating OWASP ZAP - DevCSecCon talk
Automating OWASP ZAP - DevCSecCon talk
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
 
Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !Java and Containers - Make it Awesome !
Java and Containers - Make it Awesome !
 
Performance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons LearnedPerformance Benchmarking: Tips, Tricks, and Lessons Learned
Performance Benchmarking: Tips, Tricks, and Lessons Learned
 
Training Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & RecoveryTraining Slides: 203 - Backup & Recovery
Training Slides: 203 - Backup & Recovery
 
Creating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just WorksCreating a Benchmarking Infrastructure That Just Works
Creating a Benchmarking Infrastructure That Just Works
 

More from Tomasz Rękawek

Radio ad blocker
Radio ad blockerRadio ad blocker
Radio ad blocker
Tomasz Rękawek
 
Deep-dive into cloud-native AEM deployments based on Kubernetes
Deep-dive into cloud-native AEM deployments based on KubernetesDeep-dive into cloud-native AEM deployments based on Kubernetes
Deep-dive into cloud-native AEM deployments based on Kubernetes
Tomasz Rękawek
 
Emulating Game Boy in Java
Emulating Game Boy in JavaEmulating Game Boy in Java
Emulating Game Boy in Java
Tomasz Rękawek
 
Zero downtime deployments for the Sling-based apps using Docker
Zero downtime deployments for the Sling-based apps using DockerZero downtime deployments for the Sling-based apps using Docker
Zero downtime deployments for the Sling-based apps using Docker
Tomasz Rękawek
 
Inter-Sling communication with message queue
Inter-Sling communication with message queueInter-Sling communication with message queue
Inter-Sling communication with message queue
Tomasz Rękawek
 
Sling Dynamic Include
Sling Dynamic IncludeSling Dynamic Include
Sling Dynamic Include
Tomasz Rękawek
 
Shooting rabbits with sling
Shooting rabbits with slingShooting rabbits with sling
Shooting rabbits with sling
Tomasz Rękawek
 

More from Tomasz Rękawek (9)

Radio ad blocker
Radio ad blockerRadio ad blocker
Radio ad blocker
 
Deep-dive into cloud-native AEM deployments based on Kubernetes
Deep-dive into cloud-native AEM deployments based on KubernetesDeep-dive into cloud-native AEM deployments based on Kubernetes
Deep-dive into cloud-native AEM deployments based on Kubernetes
 
Emulating Game Boy in Java
Emulating Game Boy in JavaEmulating Game Boy in Java
Emulating Game Boy in Java
 
Zero downtime deployments for the Sling-based apps using Docker
Zero downtime deployments for the Sling-based apps using DockerZero downtime deployments for the Sling-based apps using Docker
Zero downtime deployments for the Sling-based apps using Docker
 
SlingQuery
SlingQuerySlingQuery
SlingQuery
 
Code metrics
Code metricsCode metrics
Code metrics
 
Inter-Sling communication with message queue
Inter-Sling communication with message queueInter-Sling communication with message queue
Inter-Sling communication with message queue
 
Sling Dynamic Include
Sling Dynamic IncludeSling Dynamic Include
Sling Dynamic Include
 
Shooting rabbits with sling
Shooting rabbits with slingShooting rabbits with sling
Shooting rabbits with sling
 

Recently uploaded

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 

Recently uploaded (20)

Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 

CRX2Oak - all the secrets of repository migration

  • 1. #evolverocks CRX2OAK – ALL THE SECRETS OF REPOSITORY MIGRATION TOMEK RĘKAWEK, ADOBE RESEARCH Aug 30, 2016
  • 2. #evolverocks 2 • Overview of CRX2Oak • CRX2Oak command line • Features • Case study: large migration • General migration tips • Using CRX2Oak for AEM upgrade • Q & (hopefully) A AGENDA
  • 3. #evolverocks 3 OVERVIEW OF THE CRX2OAK UPGRADE FROM CRX2 CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak
  • 4. #evolverocks 4 OVERVIEW OF THE CRX2OAK UPGRADE OR SIDEGRADE CQ 5.x – CRX2 AEM 6.x – Jackrabbit Oak AEM 6.x – Oak
  • 5. #evolverocks 5 OVERVIEW OF THE CRX2OAK MIGRATING BINARIES
  • 6. #evolverocks 6 • CRX2Oak is a command-line tool: • java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET • Source and target defines the repositories. Supported formats: • path to the CRX2 “repository” directory, eg. crx-quickstart/repository • path to the Oak SegmentMK “repository” directory, as above • Mongo URI, eg. mongodb://localhost:27017/aem • JDBC URI, eg. jdbc:mysql://localhost:3306/sakila?profileSQL=true CRX2OAK COMMAND LINE REPOSITORY PARAMETER TYPES
  • 7. #evolverocks 7 • java -jar crx2oak.jar [options] [datastore-options] SOURCE TARGET • The source blob store is defined using: --src-datastore or --src-s3datastore. • If there’s no blob store defined for source, CRX2Oak assumes embedded • If the source blob store is defined, it will be used for target as well (only references will be copied, not actual binaries) • It can be overridden with --copy-binaries • Destination blob store can be defined with: --datastore or --s3datastore CRX2OAK COMMAND LINE DEFINING DATASTORE TO BE USED
  • 10. #evolverocks 10 • Client requirements • CQ 5.6.1 instance with a large number of sites and assets, storing binaries in S3 • The content is being authored 24/7 • The migration of the whole content takes about 20h • The migration is being done offline and the instance can’t be down so long • The upgraded instance has to be tested before going live • Strategy • Snapshot the instance and migrate the copy • Perform tests on it • Top-up the changes introduced after snapshot CASE STUDY INTRODUCTION
  • 12. #evolverocks 12 • The migration (4) will be much faster, as only the diff will be migrated • In the (4) use --skip-init, so the existing repository won’t be reinitialized • Also, use --include-paths=/content/mysite to migrate only the modified subtree CASE STUDY REMARKS
  • 13. #evolverocks 13 • When using Mongo (either as source or destination), run CRX2Oak on the same machine as Mongo primary • If you don’t need version history for deleted nodes, use --copy-orphaned- versions=false to make the migration faster • CRX2Oak may be used to copy content between existing repositories. Use following parameters: • --skip-init, so the destination is not initialized with the index definitions, • --{include,merge}-paths to refer which subtrees should be copied • --copy-orphaned-versions=false GENERAL MIGRATION TIPS
  • 14. #evolverocks 14 • When upgrading CQ 5.x + S3, crx2oak calls AWS asking for length of each binary • the lengths are stored in Oak but not in CRX2, so we have to ask about it • For a large repositories it may slow down the whole migration • It’s possible to pre-fetch all lengths, store them in a text file and configure CRX (and therefore CRX2Oak) to use it • More information: • https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/oak/upgrade /blob/LengthCachingDataStore.html • Sample configuration files: • http://bit.ly/cq5-s3-upgrade GENERAL MIGRATION TIPS UPGRADING CQ 5.X STORING BINARIES IN AWS S3
  • 15. #evolverocks 15 • UUID conflict exception • may occur if the destination repository already exists (iterative migration) • remember to add --copy-orphaned-versions=false • when using --include-paths, include all modified paths: • otherwise, if the page has been moved and we include only the destination path, CRX2Oak won’t remove the page from its original position • BlobId not found exception • either source or destination blob store is not configured correctly • Unable to delete referenced node • probably CRX2Oak tries to overwrite the whole version storage (removing existing versions) • add --copy-orphaned-versions=false TROUBLESHOOTING
  • 16. #evolverocks 16 Official docs describes using the extension: • java -jar aem-quickstart-6.2.0.jar -unpack # unpack the AEM jar • java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare extension config • java -jar aem-quickstart-6.2.0.jar -v -x crx2oak # prepare OSGi config • java -Xmx4096m -XX:MaxPermSize=2048M -jar aem-quickstart-6.2.0.jar -v - x crx2oak -xargs -- -o migrate For running the CRX2Oak manually, the last command should be replaced with: • java -Xmx4096m -XX:MaxPermSize=2048M -jar crx- quickstart/opt/helpers/crx2oak/crx2oak.jar [source] [destination] USING EXTENSION VS RUNNING CRX2OAK MANUALLY
  • 17. #evolverocks 17 • All CRX2Oak versions offer similar features • They differ in: • Oak version used underneath (as the CRX2Oak starts a normal Oak repository) • Index definitions created during the repository initialisation • These both things are assigned to the AEM version and shouldn’t be mismatched • Table of truth: • CRX2Oak 1.2.x can be used with AEM 6.1 too, but it won’t have all the advanced features VERSIONS AEM Oak CRX2Oak AEM 6.0 1.0.x 1.0.x AEM 6.1 1.2.x 1.3.x (sic!) AEM 6.2 1.4.x 1.4.x
  • 18. #evolverocks 18 • CRX2Oak downloads: • https://repo.adobe.com/nexus/content/groups/public/com/adobe/granite/crx2oak/ • CRX2Oak documentation • https://docs.adobe.com/docs/en/aem/6-2/deploy/upgrade/using-crx2oak.html • oak-upgrade documentation: • https://jackrabbit.apache.org/oak/docs/migration.html RESOURCES