SlideShare a Scribd company logo
1 of 19
Jukka Zitting  |  Senior Developer Repository performance tuning
Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Full text indexing Questions and answers 2
Performance tuning steps Step 1: Identify the symptom Create a test case that consistently measures current performance Define the performance target if current level unacceptable Make sure that the test case and the target performance are really relevant Step 2: Identify the cause Main suspects: Hardware, Repository, Application, Client Revise the test case until the problem no longer occurs;for example: Selenium, JMeter, JUnit, Iometer Step 3: Identify/implement possible solutions Change content, configuration, code or upgrade hardware Step 4: Verify results If target not reached, iterate the process or revise the goal 3
Repository internals 4 Data Store Persistence Manager Query Index Cluster Journal
Data Store Content-addressed storage for large binary properties Arbitrarily sized binary streams Addressed by MD5 hash String properties not included, use UTF-8 to map to binary Fast delivery of binary content Read directly from disk Can also be read in ranges Improved write throughput Multiple uploads can proceed concurrently (within hardware limits) Cheap copies Garbage collection used to reclaim disk space Logically shared by the entire cluster 5 Data Store
Cluster Journal Journal of all persisted changes in the repository Content changes Namespace, nodetype registrations, etc. Used to keep all cluster nodes in sync Observation events to all cluster nodes (see JackrabbitEvent.isExternal) Search index updates Internal cache invalidation Old events need to be discarded eventually No notable performance impact, just extra disk space Keep events for the longest possible time a node can be offline without getting completely recreated Logically shared by the entire cluster Writes synchronized over the entire cluster 6 Cluster Journal
Persistence Manager Identifier-addressed storage for nodes and properties Each node has a UUID, even if not mix:referenceable Essentially a key-value store, even when backed by a RDBMS Also keeps track of node references Bundles as units of content Bundle = UUID, type, properties, child node references, etc. Only large binaries stored elsewhere in the data store Designed for balanced content hierarchies, avoid too many child nodes Atomic updates A save() call persists the entire transient space as a single atomic operation One PM per workspace (and one for the shared version store) Logically (often also physically) shared across a cluster 7 Persistence Manager
Query Index Inverse index based on Apache Lucene Flexible mapping from terms to node identifiers Special handling for the path structure Mostly synchronous index updates Long full text extraction tasks handled in background Other cluster nodes will update their indexes at next cluster sync  Everything indexed by default Indexing configuration for tweaking functionality, performance and disk usage One index per workspace (and one for the shared version store) Not shared across a cluster, indexes are local to each cluster node See http://wiki.apache.org/jackrabbit/Search#Search_Configuration 8 Query Index
Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Indexing configuration Questions and answers 9
Basic content access Very fast access by path and ID Underlying storage addressed by ID, but path traversal is in any case needed for ACL checks Relevant caches: Path to ID map (internal structure, not configurable) Item state caches (automatically balanced, configurable for special cases) Bundle cache (default fairly low, increase for large deployments) Also some PM-specific options (TarPM index, etc.) Caches optimized for a reasonably sized active working set typical web access pattern: handful of key resources and a long tail of less frequently accessed content, few writes Performance hit especially when updating nodes with lots of child nodes FineGrainedISMLocking for concurrent, non-overlapping writes 10
Example: Bundle cache configuration 11 <!-- In …/repository/worspaces/${wsp.name}/workspace.xml --> <Workspace …>   <PersistenceManager class=“…">   <paramname="bundleCacheSize" value="8"/>   </PersistenceManager> </Workspace>
Batch processing Two issues: read and write Reading lots of content Tree traversal the best approach, but will flood caches Schedule for off-peak times Add explicit delay (used by the garbage collectors) Use a dedicated cluster node for batch processing Writing lots of content (including deleting large subtrees) The entire transient space is kept in memory and committed atomically Split the operation to smaller pieces Save after every ~1k nodes Leverage the data store if possible 12
Clustering Good for horizontally scaling reads Practically zero overhead on read access Not so good for heavy concurrent writes Exclusive lock over the whole cluster Direct all writes to a single master node Leverage the data store Note the cluster sync interval for query consistency, etc. Session.refresh() can be used to force a cluster sync 13
Query performance What’s really fast? Constraints on properties, node types, full text Typically O(n) where n is the number of results, vs. the total number of nodes  What’s pretty fast? Path constraints What needs some planning? Constraints on the child axis Sorting, limit/offset  Joins What’s not yet available? Aggregate queries (COUNT, SUM, DISTINCT, etc.) Faceting 14
Join engine 15 SELECT a.* FROM [nt:unstructured] AS a JOIN [nt:unstructured] AS b   <PersistenceManager class=“…">   <paramname="bundleCacheSize" value="8"/>   </PersistenceManager> </Workspace>
Indexing configuration Default configuration Index all non-binary properties Index binary jcr:data properties (think nt:file/nt:resource) Full text extraction support for all major document formats Full text extraction from images, packages, etc. is explicitly disabled CQ5 / WEM comes with default aggregate indexing rules for cq:Pages, etc. Why change the configuration? Reduce the index size (by default almost as large as the PM) Enable features like aggregate indexes Assign boost values for selected properties to improve search result relevance 16
Indexing configuration How to change the configuration? indexing_configuration.xml file in the workspace directory Referenced by the indexingConfiguration option in the workspace.xml file See http://wiki.apache.org/jackrabbit/IndexingConfiguration Example: 17 <?xml version="1.0"?><!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd"><configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0">   <aggregateprimaryType="nt:file">    <include>jcr:content</include>  </aggregate> </configuration>
Question and Answers 18
Repository performance tuning

More Related Content

What's hot

[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅NAVER D2
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)창언 정
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Stream Processing 과 Confluent Cloud 시작하기
Stream Processing 과 Confluent Cloud 시작하기Stream Processing 과 Confluent Cloud 시작하기
Stream Processing 과 Confluent Cloud 시작하기confluent
 
Kafka monitoring using Prometheus and Grafana
Kafka monitoring using Prometheus and GrafanaKafka monitoring using Prometheus and Grafana
Kafka monitoring using Prometheus and Grafanawonyong hwang
 
RabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarRabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarErlang Solutions
 
What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0
What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0
What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0NGINX, Inc.
 
HTTP Request Smuggling
HTTP Request SmugglingHTTP Request Smuggling
HTTP Request SmugglingAkash Ashokan
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-RegionJi-Woong Choi
 
Resilience4j with Spring Boot
Resilience4j with Spring BootResilience4j with Spring Boot
Resilience4j with Spring BootKnoldus Inc.
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in KafkaJoel Koshy
 
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문Hyunmin Kim
 
Open source APM Scouter로 모니터링 잘 하기
Open source APM Scouter로 모니터링 잘 하기Open source APM Scouter로 모니터링 잘 하기
Open source APM Scouter로 모니터링 잘 하기GunHee Lee
 

What's hot (20)

[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Stream Processing 과 Confluent Cloud 시작하기
Stream Processing 과 Confluent Cloud 시작하기Stream Processing 과 Confluent Cloud 시작하기
Stream Processing 과 Confluent Cloud 시작하기
 
Kafka monitoring using Prometheus and Grafana
Kafka monitoring using Prometheus and GrafanaKafka monitoring using Prometheus and Grafana
Kafka monitoring using Prometheus and Grafana
 
RabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarRabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II Webinar
 
What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0
What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0
What’s New in NGINX Ingress Controller for Kubernetes Release 1.5.0
 
HTTP Request Smuggling
HTTP Request SmugglingHTTP Request Smuggling
HTTP Request Smuggling
 
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
Resilience4j with Spring Boot
Resilience4j with Spring BootResilience4j with Spring Boot
Resilience4j with Spring Boot
 
KAFKA 3.1.0.pdf
KAFKA 3.1.0.pdfKAFKA 3.1.0.pdf
KAFKA 3.1.0.pdf
 
Airflow 101
Airflow 101Airflow 101
Airflow 101
 
Consumer offset management in Kafka
Consumer offset management in KafkaConsumer offset management in Kafka
Consumer offset management in Kafka
 
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
[AWSKRUG 컨테이너 소모임] Rancher 기본 입문
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Open source APM Scouter로 모니터링 잘 하기
Open source APM Scouter로 모니터링 잘 하기Open source APM Scouter로 모니터링 잘 하기
Open source APM Scouter로 모니터링 잘 하기
 

Viewers also liked

Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Jukka Zitting
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repositoryJukka Zitting
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStoreJukka Zitting
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical modelJukka Zitting
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorJukka Zitting
 
/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repositoryJukka Zitting
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CIJukka Zitting
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache JackrabbitJukka Zitting
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6Jukka Zitting
 
Enterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIEnterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIGokhan Atil
 
JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?connectwebex
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil
 
新浪云平台的经验和教训
新浪云平台的经验和教训新浪云平台的经验和教训
新浪云平台的经验和教训easychen
 
Shakespeare revealed 02.ppt
Shakespeare revealed 02.pptShakespeare revealed 02.ppt
Shakespeare revealed 02.pptrwakefor
 
Digital thinking
Digital thinkingDigital thinking
Digital thinkingTony Ryan
 
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisOpen Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisKennisland
 

Viewers also liked (20)

Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStore
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical model
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache Incubator
 
/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CI
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache Jackrabbit
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6
 
Enterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIEnterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLI
 
JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAs
 
新浪云平台的经验和教训
新浪云平台的经验和教训新浪云平台的经验和教训
新浪云平台的经验和教训
 
Good Luck
Good LuckGood Luck
Good Luck
 
Shakespeare revealed 02.ppt
Shakespeare revealed 02.pptShakespeare revealed 02.ppt
Shakespeare revealed 02.ppt
 
Marek
MarekMarek
Marek
 
Digital thinking
Digital thinkingDigital thinking
Digital thinking
 
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisOpen Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
 

Similar to Repository performance tuning

Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performanceintelliyole
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platformSurinder Mehra
 
Unit-4 swapping.pptx
Unit-4 swapping.pptxUnit-4 swapping.pptx
Unit-4 swapping.pptxItechAnand1
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalkgordonyorke
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityAshok Modi
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OSC.U
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshopdlieberman
 
Main memory os - prashant odhavani- 160920107003
Main memory   os - prashant odhavani- 160920107003Main memory   os - prashant odhavani- 160920107003
Main memory os - prashant odhavani- 160920107003Prashant odhavani
 

Similar to Repository performance tuning (20)

Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Ch8
Ch8Ch8
Ch8
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performance
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
 
Unit-4 swapping.pptx
Unit-4 swapping.pptxUnit-4 swapping.pptx
Unit-4 swapping.pptx
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and Scalability
 
tittle
tittletittle
tittle
 
OS_Ch9
OS_Ch9OS_Ch9
OS_Ch9
 
OSCh9
OSCh9OSCh9
OSCh9
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshop
 
Main memory os - prashant odhavani- 160920107003
Main memory   os - prashant odhavani- 160920107003Main memory   os - prashant odhavani- 160920107003
Main memory os - prashant odhavani- 160920107003
 

More from Jukka Zitting

Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitJukka Zitting
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiJukka Zitting
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On SteroidsJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of JackrabbitJukka Zitting
 

More from Jukka Zitting (9)

Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache Tika
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
NoSQL Oakland
NoSQL OaklandNoSQL Oakland
NoSQL Oakland
 
Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache Jackrabbit
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache Jackrabbi
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On Steroids
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of Jackrabbit
 
Apache Tika
Apache TikaApache Tika
Apache Tika
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Repository performance tuning

  • 1. Jukka Zitting | Senior Developer Repository performance tuning
  • 2. Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Full text indexing Questions and answers 2
  • 3. Performance tuning steps Step 1: Identify the symptom Create a test case that consistently measures current performance Define the performance target if current level unacceptable Make sure that the test case and the target performance are really relevant Step 2: Identify the cause Main suspects: Hardware, Repository, Application, Client Revise the test case until the problem no longer occurs;for example: Selenium, JMeter, JUnit, Iometer Step 3: Identify/implement possible solutions Change content, configuration, code or upgrade hardware Step 4: Verify results If target not reached, iterate the process or revise the goal 3
  • 4. Repository internals 4 Data Store Persistence Manager Query Index Cluster Journal
  • 5. Data Store Content-addressed storage for large binary properties Arbitrarily sized binary streams Addressed by MD5 hash String properties not included, use UTF-8 to map to binary Fast delivery of binary content Read directly from disk Can also be read in ranges Improved write throughput Multiple uploads can proceed concurrently (within hardware limits) Cheap copies Garbage collection used to reclaim disk space Logically shared by the entire cluster 5 Data Store
  • 6. Cluster Journal Journal of all persisted changes in the repository Content changes Namespace, nodetype registrations, etc. Used to keep all cluster nodes in sync Observation events to all cluster nodes (see JackrabbitEvent.isExternal) Search index updates Internal cache invalidation Old events need to be discarded eventually No notable performance impact, just extra disk space Keep events for the longest possible time a node can be offline without getting completely recreated Logically shared by the entire cluster Writes synchronized over the entire cluster 6 Cluster Journal
  • 7. Persistence Manager Identifier-addressed storage for nodes and properties Each node has a UUID, even if not mix:referenceable Essentially a key-value store, even when backed by a RDBMS Also keeps track of node references Bundles as units of content Bundle = UUID, type, properties, child node references, etc. Only large binaries stored elsewhere in the data store Designed for balanced content hierarchies, avoid too many child nodes Atomic updates A save() call persists the entire transient space as a single atomic operation One PM per workspace (and one for the shared version store) Logically (often also physically) shared across a cluster 7 Persistence Manager
  • 8. Query Index Inverse index based on Apache Lucene Flexible mapping from terms to node identifiers Special handling for the path structure Mostly synchronous index updates Long full text extraction tasks handled in background Other cluster nodes will update their indexes at next cluster sync Everything indexed by default Indexing configuration for tweaking functionality, performance and disk usage One index per workspace (and one for the shared version store) Not shared across a cluster, indexes are local to each cluster node See http://wiki.apache.org/jackrabbit/Search#Search_Configuration 8 Query Index
  • 9. Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Indexing configuration Questions and answers 9
  • 10. Basic content access Very fast access by path and ID Underlying storage addressed by ID, but path traversal is in any case needed for ACL checks Relevant caches: Path to ID map (internal structure, not configurable) Item state caches (automatically balanced, configurable for special cases) Bundle cache (default fairly low, increase for large deployments) Also some PM-specific options (TarPM index, etc.) Caches optimized for a reasonably sized active working set typical web access pattern: handful of key resources and a long tail of less frequently accessed content, few writes Performance hit especially when updating nodes with lots of child nodes FineGrainedISMLocking for concurrent, non-overlapping writes 10
  • 11. Example: Bundle cache configuration 11 <!-- In …/repository/worspaces/${wsp.name}/workspace.xml --> <Workspace …> <PersistenceManager class=“…"> <paramname="bundleCacheSize" value="8"/> </PersistenceManager> </Workspace>
  • 12. Batch processing Two issues: read and write Reading lots of content Tree traversal the best approach, but will flood caches Schedule for off-peak times Add explicit delay (used by the garbage collectors) Use a dedicated cluster node for batch processing Writing lots of content (including deleting large subtrees) The entire transient space is kept in memory and committed atomically Split the operation to smaller pieces Save after every ~1k nodes Leverage the data store if possible 12
  • 13. Clustering Good for horizontally scaling reads Practically zero overhead on read access Not so good for heavy concurrent writes Exclusive lock over the whole cluster Direct all writes to a single master node Leverage the data store Note the cluster sync interval for query consistency, etc. Session.refresh() can be used to force a cluster sync 13
  • 14. Query performance What’s really fast? Constraints on properties, node types, full text Typically O(n) where n is the number of results, vs. the total number of nodes What’s pretty fast? Path constraints What needs some planning? Constraints on the child axis Sorting, limit/offset Joins What’s not yet available? Aggregate queries (COUNT, SUM, DISTINCT, etc.) Faceting 14
  • 15. Join engine 15 SELECT a.* FROM [nt:unstructured] AS a JOIN [nt:unstructured] AS b <PersistenceManager class=“…"> <paramname="bundleCacheSize" value="8"/> </PersistenceManager> </Workspace>
  • 16. Indexing configuration Default configuration Index all non-binary properties Index binary jcr:data properties (think nt:file/nt:resource) Full text extraction support for all major document formats Full text extraction from images, packages, etc. is explicitly disabled CQ5 / WEM comes with default aggregate indexing rules for cq:Pages, etc. Why change the configuration? Reduce the index size (by default almost as large as the PM) Enable features like aggregate indexes Assign boost values for selected properties to improve search result relevance 16
  • 17. Indexing configuration How to change the configuration? indexing_configuration.xml file in the workspace directory Referenced by the indexingConfiguration option in the workspace.xml file See http://wiki.apache.org/jackrabbit/IndexingConfiguration Example: 17 <?xml version="1.0"?><!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd"><configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregateprimaryType="nt:file"> <include>jcr:content</include> </aggregate> </configuration>