Tarpm Clustering


Published on

Published in: Technology
  • Be the first to comment

Tarpm Clustering

  1. 1. www.day.com CRX 1.4 TarPersistenceManager and Clustering with TarPersistenceManager Speaker: Honwai Wong, SSE Duration: 45 min Feedback: techsummit@day.com Day Technical Summit 2008 1
  2. 2. www.day.com Agenda  TarPM (TarPersistenceManager)  Functionality  Configuration  Optimization  Hot Backup  Migration  TarPM (TarPersistenceManager) Clustering  Architecture  Global Data Store  Setup  Configuration Day Technical Summit 2008
  3. 3. www.day.com TarPM Functionality  Disk-based PersistenceManager  Uses standard Tar file format (POSIX standard)  Append-only write operations, thus extremely efficient  Particularly suitable for high data creation and modification use-cases  Takes advantage of key-value pair data structure of CRX  Maintains index files for fast access  Hot backup capability Day Technical Summit 2008 3
  4. 4. www.day.com TarPM Configuration  TarPM configuration is done on a per workspace-level e.g. <crx_home>/workspaces/crx.default/workspace.xml  no mandatory parameters, preset with default values <PersistenceManager class= quot;com.day.crx.persistence.tar.TarPersistenceManagerquot; /> Day Technical Summit 2008 4
  5. 5. www.day.com TarPM Configuration - Parameters Parameter Description default The directory where local files are stored. This can be an base directory localPath absolute or relative path. of workspace If the current data file grows larger than this number (in MB), maxFileSize 64 a new data file is created. After an abnormal termination, at most this much data (in maxIndexBuffer 32 MB) needs to be scanned to re-create the tar entry index. optimizeSleep Number in milliseconds to wait after each optimization step. 1 Day Technical Summit 2008 5
  6. 6. www.day.com TarPM Optimization  Append-only operation leads to increased disk usage  Data in the tar files is never overwritten  Delete will append 0 length entries  Optimization task copies active data from old tar files into new ones and subsequently deletes old tar files  Different modes of operation supported  Recommended to run during times of low system usage Day Technical Summit 2008 6
  7. 7. www.day.com TarPM Optimization - Modes  Manually trigger optimization from CRX Explorer  Place a file called optimize.tar in data-directory  TarPM detects this file and starts optimization  optimize.tar is renamed to optimizeNow.tar  after optimization finished, optimizeNow.tar is deleted automatically  stop task by deleting this file  Automate using cron-job  Offline optimization using command-line tool  java -cp <jars> com.day.crx.persistence.tar.TarUtils -optimize <directory> Day Technical Summit 2008 7
  8. 8. www.day.com TarPM Hot Backup  Reminder: tar files are append-only  Backup at any time including runtime  Place file stopdelete.tar to prevent the TarPM from deleting old files while backing up  Consistent backup by copying data_*.tar files, sorted by modification date, newest last  e.g. ls -tr data_*.tar | xargs -n1 -J % cp -v % /backup  When restoring, incomplete transactions are rolled back Day Technical Summit 2008 8
  9. 9. www.day.com TarPM Migration  Migration of workspaces using CRX Console  Low-level copy of existing workspace to new TarPM workspace  Tool is provided with CRX  Comprehensive documentation on docs.day.com  see section CQ 4.2 / Setup / Migration  Part of migration presentation from Tech Summit 2007  http://daycare.day.com/home/day_public/tech_summit_2007.html  Author: Dominique Jaeggi, SSE, Day Day Technical Summit 2008 9
  10. 10. www.day.com TarPM Clustering Architecture  Master/Slave relation between CRX cluster nodes participating in a cluster  Consists of 2 or more CRX cluster nodes with TarPM  Synchronization via file-based Cluster Journal  Direct communication between cluster nodes via TCP/IP using HTTP  Only Master CRX node writes data  Master is elected automatically  Automatic fail-over Day Technical Summit 2008 10
  11. 11. www.day.com TarPM Clustering Architecture - Overview Cluster Node A Cluster Node B (Master) s old H s CRX CRX ock Journal posts L to master TAR PM TAR PM master write read master write read Master Data TAR Cluster Journal (FS) FS Day Technical Summit 2008 11
  12. 12. www.day.com TarPM Clustering Global Data Store  Central storage for binary data, even beyond repository boundaries  Only one copy per unique object is kept  Storing and reading does not block other users, done outside Persistence Manager  Objects in the Data Store are immutable  Only unique data identifier of existing objects in the Data Store are stored in the Persistence Manager  Transactional semantics guaranteed  Hot Backup by simply copying all files :) Day Technical Summit 2008 12
  13. 13. www.day.com TarPM Clustering Global Data Store - Configuration  Configured in repository.xml of CRX  e.g. <crx_home>/server/runtime/0/_crx/WEB-INF/repository.xml  File-based or database-backed  org.apache.jackrabbit.core.data.FileDataStore  org.apache.jackrabbit.core.data.db.DbDataStore Day Technical Summit 2008 13
  14. 14. www.day.com TarPM Clustering FileDataStore - Config Parameters Parameter Description default repository.home/ path The directory where to store binary objects. repository/datastore Binary objects bigger than this value (in bytes) are minRecordLength 100 stored in the Data Store. Day Technical Summit 2008 14
  15. 15. www.day.com TarPM Clustering DbDataStore - Config Parameters Parameter Description default url The database URL used to access the database. - user Name of the database-user. - password Password of the user. - Binary objects bigger than this value (in bytes) are minRecordLength 100 stored in the Data Store. maxConnections The maximum number of open connections. 3 Day Technical Summit 2008 15
  16. 16. www.day.com TarPM Clustering Architecture Cluster Node A Cluster Node B (Master) s old H s ock CRX CRX L Journal posts to master TAR PM TAR PM master write read master write read Master Data TAR Cluster Journal (FS) Global Data Store FS Day Technical Summit 2008 16
  17. 17. www.day.com TarPM Clustering Setup  Install CRX  Configure clustering in repository.xml  Configure TarPM to run in cluster mode  Setup additional CRX cluster node by copying complete instance  Delete repository-local revision, if present  On startup, CRX cluster node will sync up with master data based on journal Day Technical Summit 2008 17
  18. 18. www.day.com TarPM Clustering Repository Configuration  Enable clustering on a repository-wide level  e.g. <crx_home>/runtime/0/_crx/WEB-INF/repository.xml  Unique cluster id  Cluster Journal <Cluster id=quot;cluster-node-1quot; syncDelay=quot;1quot;> <Journal class=quot;org.apache.jackrabbit.core.journal.FileJournalquot;> <param name=quot;revisionquot; value=quot;${rep.home}/revision.logquot; /> <param name=quot;directoryquot; value=quot;/data/shared/journalquot; /> </Journal> </Cluster> Day Technical Summit 2008 18
  19. 19. www.day.com TarPM Clustering Repository Configuration - Parameters luster C Parameter Description id This is required to be a unique literal id of the cluster node. Delay in milliseconds before changes to the journal are syncDelay automatically detected. Default: 5000 urnal Jo Parameter Description FQN of org.apache.jackrabbit.core.journal.Journal interface class implementation. revision Location and filename of repository-local revision counter. directory Shared directory of journal entries and global revision counter. Day Technical Summit 2008 19
  20. 20. www.day.com TarPM Clustering Workspace Configuration  enable clustering on the TarPM  e.g. <crx_home>/crx/workspaces/crx.default/workspace.xml  set cluster flag  configure local and shared paths <PersistenceManager class=quot;com.day.crx.persistence.tar.TarPersistenceManagerquot;> <param name=quot;clusterquot; value=quot;truequot; /> <param name=quot;localPathquot; value=quot;${wsp.home}quot; /> <param name=quot;sharedPathquot; value=quot;/data/sharedquot; /> </PersistenceManager> Day Technical Summit 2008 20
  21. 21. www.day.com TarPM Clustering TarPM Configuration - Parameters Parameter Description default cluster Enables clustering. FALSE localPath Path where to store local tar-files and index-files. workspace.home sharedPath Path where to store shared data, i.e. tar-files. workspace.home Day Technical Summit 2008 21
  22. 22. www.day.com TarPM Questions? Day Technical Summit 2008 22