SlideShare a Scribd company logo
1 of 19
Jukka Zitting  |  Senior Developer Repository performance tuning
Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Full text indexing Questions and answers 2
Performance tuning steps Step 1: Identify the symptom Create a test case that consistently measures current performance Define the performance target if current level unacceptable Make sure that the test case and the target performance are really relevant Step 2: Identify the cause Main suspects: Hardware, Repository, Application, Client Revise the test case until the problem no longer occurs;for example: Selenium, JMeter, JUnit, Iometer Step 3: Identify/implement possible solutions Change content, configuration, code or upgrade hardware Step 4: Verify results If target not reached, iterate the process or revise the goal 3
Repository internals 4 Data Store Persistence Manager Query Index Cluster Journal
Data Store Content-addressed storage for large binary properties Arbitrarily sized binary streams Addressed by MD5 hash String properties not included, use UTF-8 to map to binary Fast delivery of binary content Read directly from disk Can also be read in ranges Improved write throughput Multiple uploads can proceed concurrently (within hardware limits) Cheap copies Garbage collection used to reclaim disk space Logically shared by the entire cluster 5 Data Store
Cluster Journal Journal of all persisted changes in the repository Content changes Namespace, nodetype registrations, etc. Used to keep all cluster nodes in sync Observation events to all cluster nodes (see JackrabbitEvent.isExternal) Search index updates Internal cache invalidation Old events need to be discarded eventually No notable performance impact, just extra disk space Keep events for the longest possible time a node can be offline without getting completely recreated Logically shared by the entire cluster Writes synchronized over the entire cluster 6 Cluster Journal
Persistence Manager Identifier-addressed storage for nodes and properties Each node has a UUID, even if not mix:referenceable Essentially a key-value store, even when backed by a RDBMS Also keeps track of node references Bundles as units of content Bundle = UUID, type, properties, child node references, etc. Only large binaries stored elsewhere in the data store Designed for balanced content hierarchies, avoid too many child nodes Atomic updates A save() call persists the entire transient space as a single atomic operation One PM per workspace (and one for the shared version store) Logically (often also physically) shared across a cluster 7 Persistence Manager
Query Index Inverse index based on Apache Lucene Flexible mapping from terms to node identifiers Special handling for the path structure Mostly synchronous index updates Long full text extraction tasks handled in background Other cluster nodes will update their indexes at next cluster sync  Everything indexed by default Indexing configuration for tweaking functionality, performance and disk usage One index per workspace (and one for the shared version store) Not shared across a cluster, indexes are local to each cluster node See http://wiki.apache.org/jackrabbit/Search#Search_Configuration 8 Query Index
Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Indexing configuration Questions and answers 9
Basic content access Very fast access by path and ID Underlying storage addressed by ID, but path traversal is in any case needed for ACL checks Relevant caches: Path to ID map (internal structure, not configurable) Item state caches (automatically balanced, configurable for special cases) Bundle cache (default fairly low, increase for large deployments) Also some PM-specific options (TarPM index, etc.) Caches optimized for a reasonably sized active working set typical web access pattern: handful of key resources and a long tail of less frequently accessed content, few writes Performance hit especially when updating nodes with lots of child nodes FineGrainedISMLocking for concurrent, non-overlapping writes 10
Example: Bundle cache configuration 11 <!-- In …/repository/worspaces/${wsp.name}/workspace.xml --> <Workspace …>   <PersistenceManager class=“…">   <paramname="bundleCacheSize" value="8"/>   </PersistenceManager> </Workspace>
Batch processing Two issues: read and write Reading lots of content Tree traversal the best approach, but will flood caches Schedule for off-peak times Add explicit delay (used by the garbage collectors) Use a dedicated cluster node for batch processing Writing lots of content (including deleting large subtrees) The entire transient space is kept in memory and committed atomically Split the operation to smaller pieces Save after every ~1k nodes Leverage the data store if possible 12
Clustering Good for horizontally scaling reads Practically zero overhead on read access Not so good for heavy concurrent writes Exclusive lock over the whole cluster Direct all writes to a single master node Leverage the data store Note the cluster sync interval for query consistency, etc. Session.refresh() can be used to force a cluster sync 13
Query performance What’s really fast? Constraints on properties, node types, full text Typically O(n) where n is the number of results, vs. the total number of nodes  What’s pretty fast? Path constraints What needs some planning? Constraints on the child axis Sorting, limit/offset  Joins What’s not yet available? Aggregate queries (COUNT, SUM, DISTINCT, etc.) Faceting 14
Join engine 15 SELECT a.* FROM [nt:unstructured] AS a JOIN [nt:unstructured] AS b   <PersistenceManager class=“…">   <paramname="bundleCacheSize" value="8"/>   </PersistenceManager> </Workspace>
Indexing configuration Default configuration Index all non-binary properties Index binary jcr:data properties (think nt:file/nt:resource) Full text extraction support for all major document formats Full text extraction from images, packages, etc. is explicitly disabled CQ5 / WEM comes with default aggregate indexing rules for cq:Pages, etc. Why change the configuration? Reduce the index size (by default almost as large as the PM) Enable features like aggregate indexes Assign boost values for selected properties to improve search result relevance 16
Indexing configuration How to change the configuration? indexing_configuration.xml file in the workspace directory Referenced by the indexingConfiguration option in the workspace.xml file See http://wiki.apache.org/jackrabbit/IndexingConfiguration Example: 17 <?xml version="1.0"?><!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd"><configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0">   <aggregateprimaryType="nt:file">    <include>jcr:content</include>  </aggregate> </configuration>
Question and Answers 18
Repository performance tuning

More Related Content

What's hot

Boost your productivity with Scala tooling!
Boost your productivity  with Scala tooling!Boost your productivity  with Scala tooling!
Boost your productivity with Scala tooling!MeriamLachkar1
 
Spring Boot—Production Boost
Spring Boot—Production BoostSpring Boot—Production Boost
Spring Boot—Production BoostVMware Tanzu
 
Java EE 再入門
Java EE 再入門Java EE 再入門
Java EE 再入門minazou67
 
Vig tutorial jan-2007
Vig tutorial jan-2007Vig tutorial jan-2007
Vig tutorial jan-2007KJ Hsieh
 
Spring Boot & Actuators
Spring Boot & ActuatorsSpring Boot & Actuators
Spring Boot & ActuatorsVMware Tanzu
 
Exception handling
Exception handlingException handling
Exception handlingAnna Pietras
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cEdelweiss Kammermann
 
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ BehaviourWAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ BehaviourSoroush Dalili
 
DevNexus 2019: Migrating to Java 11
DevNexus 2019: Migrating to Java 11DevNexus 2019: Migrating to Java 11
DevNexus 2019: Migrating to Java 11DaliaAboSheasha
 
Logstash-Elasticsearch-Kibana
Logstash-Elasticsearch-KibanaLogstash-Elasticsearch-Kibana
Logstash-Elasticsearch-Kibanadknx01
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
killdisk and its use in ukraine hacks
killdisk and its use in ukraine hackskilldisk and its use in ukraine hacks
killdisk and its use in ukraine hacksSrishti Kumari
 
JSF2.2で簡単webアプリケーション開発
JSF2.2で簡単webアプリケーション開発JSF2.2で簡単webアプリケーション開発
JSF2.2で簡単webアプリケーション開発Masuji Katoda
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for HumansCitus Data
 
Spring batch for large enterprises operations
Spring batch for large enterprises operations Spring batch for large enterprises operations
Spring batch for large enterprises operations Ignasi González
 
Functional Reactive Programming with RxJS
Functional Reactive Programming with RxJSFunctional Reactive Programming with RxJS
Functional Reactive Programming with RxJSstefanmayer13
 

What's hot (20)

Boost your productivity with Scala tooling!
Boost your productivity  with Scala tooling!Boost your productivity  with Scala tooling!
Boost your productivity with Scala tooling!
 
Spring Boot Tutorial
Spring Boot TutorialSpring Boot Tutorial
Spring Boot Tutorial
 
Spring Boot—Production Boost
Spring Boot—Production BoostSpring Boot—Production Boost
Spring Boot—Production Boost
 
Java EE 再入門
Java EE 再入門Java EE 再入門
Java EE 再入門
 
Vig tutorial jan-2007
Vig tutorial jan-2007Vig tutorial jan-2007
Vig tutorial jan-2007
 
PowerShell UIAtomation
PowerShell UIAtomationPowerShell UIAtomation
PowerShell UIAtomation
 
Spring Boot & Actuators
Spring Boot & ActuatorsSpring Boot & Actuators
Spring Boot & Actuators
 
Exception handling
Exception handlingException handling
Exception handling
 
Smoke testing with Go
Smoke testing with GoSmoke testing with Go
Smoke testing with Go
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
 
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ BehaviourWAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
WAF Bypass Techniques - Using HTTP Standard and Web Servers’ Behaviour
 
Optional in Java 8
Optional in Java 8Optional in Java 8
Optional in Java 8
 
DevNexus 2019: Migrating to Java 11
DevNexus 2019: Migrating to Java 11DevNexus 2019: Migrating to Java 11
DevNexus 2019: Migrating to Java 11
 
Logstash-Elasticsearch-Kibana
Logstash-Elasticsearch-KibanaLogstash-Elasticsearch-Kibana
Logstash-Elasticsearch-Kibana
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
killdisk and its use in ukraine hacks
killdisk and its use in ukraine hackskilldisk and its use in ukraine hacks
killdisk and its use in ukraine hacks
 
JSF2.2で簡単webアプリケーション開発
JSF2.2で簡単webアプリケーション開発JSF2.2で簡単webアプリケーション開発
JSF2.2で簡単webアプリケーション開発
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 
Spring batch for large enterprises operations
Spring batch for large enterprises operations Spring batch for large enterprises operations
Spring batch for large enterprises operations
 
Functional Reactive Programming with RxJS
Functional Reactive Programming with RxJSFunctional Reactive Programming with RxJS
Functional Reactive Programming with RxJS
 

Viewers also liked

Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Jukka Zitting
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repositoryJukka Zitting
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStoreJukka Zitting
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical modelJukka Zitting
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorJukka Zitting
 
/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repositoryJukka Zitting
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CIJukka Zitting
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache JackrabbitJukka Zitting
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6Jukka Zitting
 
Enterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIEnterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIGokhan Atil
 
JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?connectwebex
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil
 
新浪云平台的经验和教训
新浪云平台的经验和教训新浪云平台的经验和教训
新浪云平台的经验和教训easychen
 
Shakespeare revealed 02.ppt
Shakespeare revealed 02.pptShakespeare revealed 02.ppt
Shakespeare revealed 02.pptrwakefor
 
Digital thinking
Digital thinkingDigital thinking
Digital thinkingTony Ryan
 
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisOpen Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisKennisland
 

Viewers also liked (20)

Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStore
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical model
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache Incubator
 
/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CI
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache Jackrabbit
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6
 
Enterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIEnterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLI
 
JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAs
 
新浪云平台的经验和教训
新浪云平台的经验和教训新浪云平台的经验和教训
新浪云平台的经验和教训
 
Good Luck
Good LuckGood Luck
Good Luck
 
Shakespeare revealed 02.ppt
Shakespeare revealed 02.pptShakespeare revealed 02.ppt
Shakespeare revealed 02.ppt
 
Marek
MarekMarek
Marek
 
Digital thinking
Digital thinkingDigital thinking
Digital thinking
 
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisOpen Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
 

Similar to Repository performance tuning

Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performanceintelliyole
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platformSurinder Mehra
 
Unit-4 swapping.pptx
Unit-4 swapping.pptxUnit-4 swapping.pptx
Unit-4 swapping.pptxItechAnand1
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalkgordonyorke
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityAshok Modi
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OSC.U
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshopdlieberman
 
Main memory os - prashant odhavani- 160920107003
Main memory   os - prashant odhavani- 160920107003Main memory   os - prashant odhavani- 160920107003
Main memory os - prashant odhavani- 160920107003Prashant odhavani
 

Similar to Repository performance tuning (20)

Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Ch8
Ch8Ch8
Ch8
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performance
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
 
Unit-4 swapping.pptx
Unit-4 swapping.pptxUnit-4 swapping.pptx
Unit-4 swapping.pptx
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and Scalability
 
tittle
tittletittle
tittle
 
OSCh9
OSCh9OSCh9
OSCh9
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
 
OS_Ch9
OS_Ch9OS_Ch9
OS_Ch9
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshop
 
Main memory os - prashant odhavani- 160920107003
Main memory   os - prashant odhavani- 160920107003Main memory   os - prashant odhavani- 160920107003
Main memory os - prashant odhavani- 160920107003
 

More from Jukka Zitting

Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitJukka Zitting
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiJukka Zitting
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On SteroidsJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of JackrabbitJukka Zitting
 

More from Jukka Zitting (9)

Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache Tika
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
NoSQL Oakland
NoSQL OaklandNoSQL Oakland
NoSQL Oakland
 
Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache Jackrabbit
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache Jackrabbi
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On Steroids
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of Jackrabbit
 
Apache Tika
Apache TikaApache Tika
Apache Tika
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Repository performance tuning

  • 1. Jukka Zitting | Senior Developer Repository performance tuning
  • 2. Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Full text indexing Questions and answers 2
  • 3. Performance tuning steps Step 1: Identify the symptom Create a test case that consistently measures current performance Define the performance target if current level unacceptable Make sure that the test case and the target performance are really relevant Step 2: Identify the cause Main suspects: Hardware, Repository, Application, Client Revise the test case until the problem no longer occurs;for example: Selenium, JMeter, JUnit, Iometer Step 3: Identify/implement possible solutions Change content, configuration, code or upgrade hardware Step 4: Verify results If target not reached, iterate the process or revise the goal 3
  • 4. Repository internals 4 Data Store Persistence Manager Query Index Cluster Journal
  • 5. Data Store Content-addressed storage for large binary properties Arbitrarily sized binary streams Addressed by MD5 hash String properties not included, use UTF-8 to map to binary Fast delivery of binary content Read directly from disk Can also be read in ranges Improved write throughput Multiple uploads can proceed concurrently (within hardware limits) Cheap copies Garbage collection used to reclaim disk space Logically shared by the entire cluster 5 Data Store
  • 6. Cluster Journal Journal of all persisted changes in the repository Content changes Namespace, nodetype registrations, etc. Used to keep all cluster nodes in sync Observation events to all cluster nodes (see JackrabbitEvent.isExternal) Search index updates Internal cache invalidation Old events need to be discarded eventually No notable performance impact, just extra disk space Keep events for the longest possible time a node can be offline without getting completely recreated Logically shared by the entire cluster Writes synchronized over the entire cluster 6 Cluster Journal
  • 7. Persistence Manager Identifier-addressed storage for nodes and properties Each node has a UUID, even if not mix:referenceable Essentially a key-value store, even when backed by a RDBMS Also keeps track of node references Bundles as units of content Bundle = UUID, type, properties, child node references, etc. Only large binaries stored elsewhere in the data store Designed for balanced content hierarchies, avoid too many child nodes Atomic updates A save() call persists the entire transient space as a single atomic operation One PM per workspace (and one for the shared version store) Logically (often also physically) shared across a cluster 7 Persistence Manager
  • 8. Query Index Inverse index based on Apache Lucene Flexible mapping from terms to node identifiers Special handling for the path structure Mostly synchronous index updates Long full text extraction tasks handled in background Other cluster nodes will update their indexes at next cluster sync Everything indexed by default Indexing configuration for tweaking functionality, performance and disk usage One index per workspace (and one for the shared version store) Not shared across a cluster, indexes are local to each cluster node See http://wiki.apache.org/jackrabbit/Search#Search_Configuration 8 Query Index
  • 9. Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Indexing configuration Questions and answers 9
  • 10. Basic content access Very fast access by path and ID Underlying storage addressed by ID, but path traversal is in any case needed for ACL checks Relevant caches: Path to ID map (internal structure, not configurable) Item state caches (automatically balanced, configurable for special cases) Bundle cache (default fairly low, increase for large deployments) Also some PM-specific options (TarPM index, etc.) Caches optimized for a reasonably sized active working set typical web access pattern: handful of key resources and a long tail of less frequently accessed content, few writes Performance hit especially when updating nodes with lots of child nodes FineGrainedISMLocking for concurrent, non-overlapping writes 10
  • 11. Example: Bundle cache configuration 11 <!-- In …/repository/worspaces/${wsp.name}/workspace.xml --> <Workspace …> <PersistenceManager class=“…"> <paramname="bundleCacheSize" value="8"/> </PersistenceManager> </Workspace>
  • 12. Batch processing Two issues: read and write Reading lots of content Tree traversal the best approach, but will flood caches Schedule for off-peak times Add explicit delay (used by the garbage collectors) Use a dedicated cluster node for batch processing Writing lots of content (including deleting large subtrees) The entire transient space is kept in memory and committed atomically Split the operation to smaller pieces Save after every ~1k nodes Leverage the data store if possible 12
  • 13. Clustering Good for horizontally scaling reads Practically zero overhead on read access Not so good for heavy concurrent writes Exclusive lock over the whole cluster Direct all writes to a single master node Leverage the data store Note the cluster sync interval for query consistency, etc. Session.refresh() can be used to force a cluster sync 13
  • 14. Query performance What’s really fast? Constraints on properties, node types, full text Typically O(n) where n is the number of results, vs. the total number of nodes What’s pretty fast? Path constraints What needs some planning? Constraints on the child axis Sorting, limit/offset Joins What’s not yet available? Aggregate queries (COUNT, SUM, DISTINCT, etc.) Faceting 14
  • 15. Join engine 15 SELECT a.* FROM [nt:unstructured] AS a JOIN [nt:unstructured] AS b <PersistenceManager class=“…"> <paramname="bundleCacheSize" value="8"/> </PersistenceManager> </Workspace>
  • 16. Indexing configuration Default configuration Index all non-binary properties Index binary jcr:data properties (think nt:file/nt:resource) Full text extraction support for all major document formats Full text extraction from images, packages, etc. is explicitly disabled CQ5 / WEM comes with default aggregate indexing rules for cq:Pages, etc. Why change the configuration? Reduce the index size (by default almost as large as the PM) Enable features like aggregate indexes Assign boost values for selected properties to improve search result relevance 16
  • 17. Indexing configuration How to change the configuration? indexing_configuration.xml file in the workspace directory Referenced by the indexingConfiguration option in the workspace.xml file See http://wiki.apache.org/jackrabbit/IndexingConfiguration Example: 17 <?xml version="1.0"?><!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd"><configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregateprimaryType="nt:file"> <include>jcr:content</include> </aggregate> </configuration>