Scale your Alfresco Solutions


Published on

In this session, we'll discuss architectural, design and tuning best practices for building rock solid and scalable Alfresco Solutions. We'll cover the typical use cases for highly scalable Alfresco solutions, like massive injection and high concurrency, also introducing 3.3 and 3.4 Transfer / Replication services for building complex high availability enterprise architectures.

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We won’t be going into details on how to setup clustering and the web tier
  • [Check with AH the background indexing stuff, i.e. is it indexing or extraction that exceeds 20 ms]
  • Theses are typically, specifics with obviously vary.
  • [Derek]
  • [Derek]
  • [PM – how does the custom loading fit into this??]
  • Scale your Alfresco Solutions

    1. 1. Mike FarmanProduct Manager, AlfrescoPeter MonksDirector, Professional Services, AlfrescoDerek HulleySenior Engineer, Alfresco2
    2. 2. Many areas to consider...• Core Repository• Web-tier load balancing and caching• Scale-up/scale out - horizontal vs. vertical• Components tuning• Replication strategies (3.4)• Profiling and benchmarking• ....We‟re going to focus on the Core Repository4
    3. 3. What happens when you create a node? 1 BeginTransaction 3 4 8 2 Write 5 Create Update DB Begin Commit stream (Transaction ID fornode in DB content URL Commit IndexTracking) to disk 6 9 Transform Add to L2 (extract) Text Cache Update 7 Index (Props & Content) Content Indexing automatically moved to background if text extraction 7a exceeds 20 ms Index Fulltext5 (Background)
    4. 4. What happens when you querying for nodes? 1 2 3 Query Batch 4 5 Results Set In Cache Result Set(Lucene) Pre-fetch 4a DB Fetch Check 6 Deliver 7 Permissions Results - Max Permission Checks - Timeout6
    5. 5. What happens when you read a nodes content? 1 4 5 Node Read 2 Fetch Stream Cached Request Content Response 3 DB Lookup7
    6. 6. Example Use Cases:• UC01: Bulk Loading • High batch throughput, ongoing • e.g. scanning, archival solutions, systems of record • Migration • One-off migration to Alfresco from legacy system • Then UC02...• UC02: Enterprise Collaboration Platform • Concurrent users, variety of interfaces • e.g. Team/Project Collaboration, Document/Knowledge Management8
    7. 7. Typical Characteristics• Large number of documents and throughput • 10‟s thousands documents injected per day, often during nightly hours • 10‟s million documents per year• Low User concurrency • 100-1000 users (read only access)• Application profile – System of Record • End users mostly search & read • Document formats: PDF, TIFF, JPG (i.e. no full text indexing) • Typically fixed metadata • No or little version control • Few to no rules, actions, workflows, content transformations• Client Interfaces • Share/Explorer or Custom e.g. Web Scripts, CMIS • Typically little CIFS/WebDAV/FTP10
    8. 8. Primary Objective is to Maximise Throughput• Parallel processing • Load nodes simultaneously• Avoid unnecessary in-transaction processing • In-transaction services often not required when loading • e.g. Transformation, Indexing• Disable unneeded services • Many standard services are not required when loading• Minimise network and file I/O operations • Get source content as close to server storage as possible• Always benchmark and tune... • JVM, Network, Threads, DB Connections...12
    9. 9. Architectural considerations• Creation is CPU, memory, network intensive • Always 64 bit • Rule of thumb: Prefer scale up over scale out – simpler deployment and management • Rule of thumb: get the content as close as possible to Alfresco• Nature of the data set (i.e. batches) is KEY • If batches are sequential -> minimize time-per-batch • Scale up in CPU and memory • If batches are parallelizable -> maximize number of batches processed • Scale out with multi-threaded uploads • Consider dedicated server(s) for ingestion • Use production servers for migration use case and then reconfigure• Design content storage around your data • How can you get the source content as close as possible to repository content storage?• Note: Avoid Sparc T and related series • Highly parallel but not suited for atomic heavy serial operations13
    10. 10. Tuning best practices - JVM Tuning – Application Server• 64 bit • Pay attention to the• Make NewSize as large as machine capacity i.e. possible to avoid spill over • Threads to OldGen • CPU Utilization • I/O• See Sample JVM Config: 64-bit, dual 2.6GHz Xeon / dual-core per CPU , 8GB RAM environment -server -Xss1M -Xms2G -Xmx3G -XX:NewSize=1G -XX:MaxPermSize=256M16
    11. 11. Bad  Good 17
    12. 12. Tuning best practices – I/O• Network • Alfresco to Database is Key • Latency is key e.g. > 10ms is absolute max • JDBC fetch size should be 150 • See BP-1_Alfresco_Environment_Validation_and_Day_Zero_Configuration • Alfresco to storage (if remote) • If possible, avoid it completely for file transfers - Stage content on local disks • Use a dedicated network for storage e.g. Fibre channel • Incoming to Alfresco – Typically not relevant for bulk loading use case• Disk • Lucene index operations are disk I/O intensive • Fast read/writes i.e. local disk • Avoid indexing if not required • Avoid unnecessary content file copying • Stage content on local disks • Consider set cm:content property directly e.g. • contentUrl=store://mypath/mydocument.docx|mimetype=application/vnd.openxmlformats- officedocument.wordprocessingml.document|size=51142|encoding=UTF-8|locale=en_GB_18
    13. 13. Tuning best practices - Database• Connections – Relevant if you are loading concurrently • See BP-1_Alfresco_Environment_Validation_and_Day_Zero_Configuration• DB Indexes & Statistics • Plan your batch loads to allow for periodic statistics maintenance• Make sure the database hardware/software is sizedappropriately e.g. • Log sizes, flush on transaction commit, cache tuning, lock management.... • Use of multiple physical volumes/RAID....•All databases provide many options to optimiseperformance • Get a DB administrator, partner involved19
    14. 14. Tuning best practice - Repository Services• Force background indexing • • Everything: index.tracking.disableInTransactionIndexing=true • Just Content: lucene.maxAtomicTransformationTime=0 • Is content indexing required at all? • DoNotIndex aspect• “Run As” system user to avoid permission checking20
    15. 15. Tuning best practice - Repository Services• Use an optimised custom bulk loader • Process docs in batches - not 1 doc per transaction or 1 transaction for entire content set • Example: 100 documents per batch • Use Foundation (Java) API if possible• Design multi-threaded import code • Partition your data set so you can use multiple threads loading in different areas • Scale up CPU accordingly•Consider direct APIs (e.g. “NodeService” vs “nodeService”) • Public services are heavily wrapped with interceptors for transactions, auditing, permissions, multilingual translations, etc.• Disable behaviours • Rules evaluations, cm:auditable, versioning, quotas (system.usages.enabled=false)•Use proper transaction demarcation • Complete all operations on a node in a single transaction • Batching – group multiple updates in a single transaction • Avoid mixing reads and writes• See session CS2-Repository_Internals for more details on API specifics21
    16. 16. Tuning best practices – Repository Services• Disable modified timestamp propagation to parent folders • system.enableTimestampPropagation=false (default)• Deleting large numbers of nodes • Skip deleted items (archive) by adding sys:temporary aspect your content before deletion• Partition your content within the repository • Depends on read access requirements • Consider partitioning more than 2000 nodes per space if browsing space children Note: Performance much improved in later releases 3.3.3, 3.4 – test for your use case22
    17. 17. Scale Out Using Dedicated Bulk Load Server(s)• Alfresco can support a non-clustered injection only tier • Objective: Separate input write process from front end read load • Solution: Dedicated injection tier pointing to same DB/Content store(s) as front end servers. No need to cluster caches from this tier with the front end. Background index properties and/or content, indexes will catch up from DB transactions. • Benefits: No Cache update/invalidation overhead. Indexing does not block loading process24
    18. 18. Bulk load server(s) not clustered but share storage and DB product servers will „catch up‟ via index trackingBulk Load Process Runtime ClientsCreates Only Bulk Load A Bulk Load B Production A Production B Production C Tomcat Tomcat Tomcat Tomcat Tomcat EHCache EHCache EHCache EHCache EHCache Lucene Lucene Lucene Lucene Lucene Index Index Index Index Index Database Content MySQL Store 25
    19. 19. Load Server(s) Configuration Tips• Bulk Load Server(s) • To exclude servers(s) from cluster: • Do not set cluster name for bulk load servers in • • Force background indexing in the local using: • Everything: • index.tracking.disableInTransactionIndexing=true • Just Content: • lucene.maxAtomicTransformationTime=0 • Note: The load process should perform creates only, no updates or reads• Production Server(s) • Ensure index tracking is enabled: • index.tracking.cronExpression=0/5 * * * * ? • index.recovery.mode=AUTO26
    20. 20. Example: In-transaction v‟s Background Indexing• 10,000 docs, 1,000 folders• 50kb word documents• FTP with 10 sessions• Laptop• Foreground Indexing: • 33 mins• Background Indexing: • 5 mins27
    21. 21. UC02: Enterprise Collaboration Platform29
    22. 22. Requirements• High (and potentially highly distributed) user concurrency • 1,000‟s -10,000‟s users (read & write) • Medium/High number of documents • 10,000-1 million+ documents • 1000 document updates per day• Complex enterprise content and permission models • Multiple content models/Dynamic ACL • Versioning and full text indexing on all documents • Document types: Office, drawing, images• Advanced content management • Multiple rules and actions • Heavy use of content transformations/workflow•Interfaces (All) • Share, WebDAV, CIFS ....30
    23. 23. Architectural considerations• Fully fledged platform deployment • Need to consider maintenance window• Scale out Share independently from Repo • Front and intermediate Load balancer/Web Cache layers • Read/write split and scheduled repository exclusion for maintenance• Scale out transformation server • Enterprise only: JOD OpenOffice subsystem• Scale out and up infrastructure • Cluster CIFS with DFS (Distributed File System) • All HTTP based protocols scale seamlessly (SSP on port 7070)•Balance multi-CPU (scale up) and multi-node clusters (scale out) • Overhead of index tracking31
    24. 24. Design best practices• Distribute your content within the repository • Otherwise search and retrieval performance degradation is likely • Use versioning and indexing where appropriate, not just because it‟s there.. • e.g. don‟t simply apply cm:versionable to the full cm:content• Modelling • Prefer aspects over types • Remember aspects support inheritance as well • Content Model indexing options • Tune what you need to index• Quotas (aka Usages) • Might save your repo from content explosion but also have an overhead!32
    25. 25. Tuning best practices – Note: Also see bulk load use case!• RDBMS • Number of connections much more important for this use case • Formula: HTTP Worker Threads + 75 per cluster node • For Tomcat defaults this is 275• Cache Configuration • L2 Cache: increase with RAM to include more objects in cache • Use ehcache tracing tool to indentify which caches have low hit ratios and increase if you have available memory • See for details• Alfresco Configuration optimization • VFS thread pool tuning (default: <threadPool init=“25” max=“50” />) • Tune ACLs and preload common searches (if needed) system.acl.maxPermissionCheckTimeMillis=10000 system.acl.maxPermissionChecks=10000 Query via node browser as different users, not only admin • Consider bulk load large user bases (10,000s) to single (un-clustered) node and then cluster • Disable eager home folder creation • home.folder.creation.eager=false in alfresco-globallproperties • Use multi-threaded and incremental LDAP sync once initial sync has been completed • Differential sync is the default• Lucene Tuning • Lucene.maxAtomicTransformationTime=20• Monitor the network performance when adding nodes to a cluster • What for ehcache waiting for the network via thread dumps • Consider disabling some/all of the L2 caches33
    26. 26. HTTP Clients Example Windows ECM CIFS e.g. Share via alfrescocifs Production Cluster Install HTTP Load Balancer DFS Round Robin - Local & Shared Content Store Active Directory User/Group Sync NTLM Authentication alfappsrv01 alfappsrv02 Tomcat 1 Tomcat 2 Local Local alf_data alf_data• Lucene Index • Lucene Index EHCache Clustered EHCache d:alf_storelucene-indexes d:alf_storelucene-indexes• Content Store • Content Store d:alf_storecontentstore d:alf_storecontentstore In & Outbound Replication In & Outbound Replication to shared content store on SAN JDBC oraclecluster alfclustsrv01 alfclustsrv02 • Replicating Content Store • Replicating Content Store Oracle 1 Oracle 2 In & Outbound replication <- Failover -> In & Outbound replication between local and shared between local and shared content store content store MSCS Cluster SAN • Shared Content Store: sharedContentStore (alfdataDatastore) • Oracle: - Data (o:oradataalfresco), Control (o:oradataalfresco) & Logfiles (L:oradataalfresco) - Oracle Backup (o:flash_recovery_area) • Lucene Index Backup (alfdataHold)
    27. 27. Replication (3.4) offers new deployment options• Replication may be appropriate for specific contexts • Provides selective replication of content between distinct Alfresco repositories • On demand or scheduled via Replication Jobs • Reporting and Tracking of Replication Jobs• Read and viewing performance: Content is served from alocal server35
    28. 28. For any system...• Do not use the OOTB settings for application server, database etcAlfresco you must always tune for your use case• Balance your resources • Separate tiers for DataBase, Content, App Servers• Indexes should always be on fast, local disk e.g. not NFS mounts,USB drives etc• Run on a supported stack e.g. • e.g. issues with 1.6u10 use JDK 1.6u.20, use MySQL 5.1.39 or later• Don‟t starve your database of connections: • db.pool.max=XXX• Use appropriate application server worker threads • Configuration details are application server specific e.g. Tomcat: server.xml• When clustering, use JGroups and Unicast• Use the latest Alfresco version/service pack e.g. • 3.3.3, 3.436
    29. 29. Things you should NOT change• The database transaction isolation level • Use defaults for all databases except MS SQLServer • FYI. SQLServer should be: • db.txn.isolation=4096 • ALTER DATABASE alfresco SET ALLOW_SNAPSHOT_ISOLATION ON;• The ehcache default configuration i.e. Replicate async• The Lucene indexing defaults unless you know what youare doing and why!• Note: Also do not do a full-index rebuild unless you knowwhat was wrong in the first place! • Use the index checker37
    30. 30. Benchmark your solutions38
    31. 31. Alfresco Benchmarks• Alfresco Benchmark Tools • alfresco-bm – • SimpleInjector – (check • For CIFS loading -> Jmeter + SMB mount• Alfresco Benchmark Results • Unisys benchmark results • JCR Benchmarks• WIP • “Scale your Alfresco Solutions” (in • More Platform benchmark ongoing – watch this space!39
    32. 32. Profiling your Alfresco solution•Alfresco Application Profiling • JMX (for Enterprise Only see Admin Guide) • Audit Surf • Nagios integration• Infrastructure Profiling • VisualVM (JVM) • Thread Dump Analyzer • • YourKit (JVM) • WireShark (Network) • Mysql Query Profiler (DBMS)
    33. 33. Q/A & Feedback• Any Questions?• Share your experiences (good and bad) with us so we canall learn! • Successful scaled up/out architectures • Limitations, bottlenecks • Use case parameters => Implementation => Results • What worked, what didn‟t43