Sizing an alfresco infrastructure has always been an interesting topic with lots of unrevealed questions. There is no perfect formula that can accurately define what is the perfect sizing for your architecture considering your use case. However, we can provide you with valuable guidance on how to size your Alfresco solution, by asking the right questions, collecting the right numbers, and taking the right assumptions on a very interesting sizing exercise.
How many alfresco servers will you need on your alfresco cluster? How many CPUs/cores do you need on those servers to handle your estimated user concurrency? How do you estimate the sizing and growth of your storage? How much memory do you need on your Solr servers? How many Solr servers do you need to get the response times you require? What are the golden rules that can drive and maintain the success of an Alfresco project?
In this session, we'll discuss architectural, design and tuning best practices for building rock solid and scalable Alfresco Solutions. We'll cover the typical use cases for highly scalable Alfresco solutions, like massive injection and high concurrency, also introducing 3.3 and 3.4 Transfer / Replication services for building complex high availability enterprise architectures.
Moving Gigantic Files Into and Out of the Alfresco RepositoryJeff Potts
Ā
This talk is a technical case study showing show Metaversant solved a problem for one of their clients, Noble Research Institute. Researchers at Noble deal with very large files which are often difficult to move into and out of the Alfresco repository.
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
Ā
Discover tips and tools that will help you to keep your Alfresco environment in shape. Most of the best tools are free or Open Source, and this presentation will guide you through the steps to improve the performance of your system.
In this session, we'll discuss architectural, design and tuning best practices for building rock solid and scalable Alfresco Solutions. We'll cover the typical use cases for highly scalable Alfresco solutions, like massive injection and high concurrency, also introducing 3.3 and 3.4 Transfer / Replication services for building complex high availability enterprise architectures.
Moving Gigantic Files Into and Out of the Alfresco RepositoryJeff Potts
Ā
This talk is a technical case study showing show Metaversant solved a problem for one of their clients, Noble Research Institute. Researchers at Noble deal with very large files which are often difficult to move into and out of the Alfresco repository.
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
Ā
Discover tips and tools that will help you to keep your Alfresco environment in shape. Most of the best tools are free or Open Source, and this presentation will guide you through the steps to improve the performance of your system.
The objective of this article is to describe what to monitor in and around Alfresco in order to have a good understanding of how the applications are performing and to be aware of potential issues.
Alfresco node lifecyle, services and zonesSanket Mehta
Ā
This ppt explains you the details about an alfresco node lifecycle (including which alfresco database tables are affected upon node operation-like node creation, deletion). Apart from it, it also explain which particular case-sensitive alfresco service should be used (nodeService vs NodeService, searchService vs SearchService) in order to maintain security in your application. Lastly it covers zones in alfresco (authentication-related zones and application-related zones)
Alfresco DevCon 2019 (Edinburgh)
"Transforming the Transformers" for Alfresco Content Services (ACS) 6.1 & beyond
https://community.alfresco.com/community/ecm/blog/2019/02/07/alfresco-transform-service-new-with-acs-61
Alfresco provides various content transformation options across the Digital Business Platform (DBP). In this talk, we will explore the new independently-scalable Alfresco Transform Service. This enables a new option for transforms to be asynchronously off-loaded by Alfresco Content Services (ACS).
https://devcon.alfresco.com/speaker/jan-vonka/
Features of Alfresco Search Services.
Features of Alfresco Search & Insight Engine.
Future plans for the product
---
DEMO GUIDE
[1] Queries: Share > Node Browser
ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'
SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')
[2] Queries: Share > JS Console
var ctxt = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var searchService = ctxt.getBean('SearchService', org.alfresco.service.cmr.search.SearchService);
var StoreRef = Packages.org.alfresco.service.cmr.repository.StoreRef;
var SearchService = Packages.org.alfresco.service.cmr.search.SearchService;
var ResultSet = Packages.org.alfresco.repo.search.impl.lucene.SolrJSONResultSet;
ResultSet =
searchService.query(
StoreRef.STORE_REF_WORKSPACE_SPACESSTORE,
SearchService.LANGUAGE_FTS_ALFRESCO,
"ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'");
logger.log(ResultSet.getNodeRefs());
---
var ctxt = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var searchService = ctxt.getBean('SearchService', org.alfresco.service.cmr.search.SearchService);
var StoreRef = Packages.org.alfresco.service.cmr.repository.StoreRef;
var SearchService = Packages.org.alfresco.service.cmr.search.SearchService;
var ResultSet = Packages.org.alfresco.repo.search.impl.lucene.SolrJSONResultSet;
ResultSet =
searchService.query(
StoreRef.STORE_REF_WORKSPACE_SPACESSTORE,
SearchService.LANGUAGE_CMIS_ALFRESCO,
"SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')");
logger.log(ResultSet.getNodeRefs());
---
var def =
{
query: "ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'",
language: "fts-alfresco"
};
var results = search.query(def);
logger.log(results);
[3] Queries: api-explorer
{
"query": {
"language": "afts",
"query": "ASPECT:\"cm:titled\" AND cm:title:\"*Sample\" AND TEXT:\"code\""
}
}
---
{
"query": {
"language": "cmis",
"query": "SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')"
}
}
[4] Queries: CMIS Workbench > Groovy Console
rs = session.query("SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')", false)
for (res in rs) {
println(res.getPropertyValueById('cmis:objectId'))
}
[5] Queries: SOLR Web Console > (alfresco) > Query
/afts
ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'
---
/cmis
SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')
---
Alfresco Web Scripts have become an important part of any Alfresco developer's tool kit and in this session we will take a deep dive into how Web Scripts can be used to provide public APIs for Alfresco extensions. After briefly reviewing the anatomy of a Web Script and discussing Alfresco's approach to Service development, we will work through an example that extends Alfresco with a simple service and creates a REST API using Web Scripts.
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseAngel Borroy LĆ³pez
Ā
Presentation on how to move from the Alfresco Search Services product based in Apache Solr to the new Alfresco Search Enterprise integrated with Elasticsearch and Amazon Opensearch.
Alfresco has provided an implementation of CMIS ever since the first draft of the specification was announced. It is the CMIS repository that all others are compared to. In this session, you'll learn how Alfresco maps to the CMIS domain model and explore how CMIS services such as query behave through live examples. You'll see how easy it is to build applications against CMIS including the use of unique Alfresco features such as Aspects.
This is the session delivered during the Alfresco Developers Conference in Lisbon, January 2018. Learn all what you need to know to perform a proper backup and disaster recovery strategy. From a single server installation with hundreds of documents to a large deployment with multiple nodes, layers, databases and multi-million documents. What is the best way for each case?
Infrastructure, use cases and performance considerations for
an Enterprise Grade ECM implementation up to 1B documents on AWS (Amazon Web Services EC2 and Aurora) based on the Alfresco (http://www.alfresco.com) Platform, leading Open Source Enterprise Content Management system.
BeeCon 2017 (Zaragoza)
https://www.youtube.com/watch?v=6pxYndS-x9A
http://beecon.buzz
A lot of effort has been put into creating a useful set of RESTful APIs in the Alfresco Content Services 5.2 release and beyond. In this talk we'll cover those new APIs and have a sneak peek at what's coming next, including Process, Governance and Search Services.
An overview of the new and enhanced APIs will be discussed and some of the key endpoints demonstrated via Postman. By the time you leave you should have enough knowledge to consume the APIs in your favourite programming language and create a simple client.
These APIs will be the foundation of all new clients developed by Alfresco so this is a must attend session!
CUST-10 Customizing the Upload File(s) dialog in Alfresco ShareAlfresco Software
Ā
Many Alfresco projects require customizations to the Share user interface that go beyond the normal configuration. This usually involves changing/overriding Repository Web Scripts and Surf Web Scripts, updating JavaScript and CSS files, coding with the Yahoo UI Library, etc. This session will customize the Alfresco Share Upload File(s) dialog and show you how to: Add Widgets to the Upload File(s) dialog, Override Surf Web Scripts, Override/Update JavaScript and CSS files, Write Repository Web Scripts, Call Web Scripts from Yahoo UI Library code, and Setup a build project for these customizations. This session will present the advanced customization concepts via hands-on tutorial and slides.
The objective of this article is to describe what to monitor in and around Alfresco in order to have a good understanding of how the applications are performing and to be aware of potential issues.
Alfresco node lifecyle, services and zonesSanket Mehta
Ā
This ppt explains you the details about an alfresco node lifecycle (including which alfresco database tables are affected upon node operation-like node creation, deletion). Apart from it, it also explain which particular case-sensitive alfresco service should be used (nodeService vs NodeService, searchService vs SearchService) in order to maintain security in your application. Lastly it covers zones in alfresco (authentication-related zones and application-related zones)
Alfresco DevCon 2019 (Edinburgh)
"Transforming the Transformers" for Alfresco Content Services (ACS) 6.1 & beyond
https://community.alfresco.com/community/ecm/blog/2019/02/07/alfresco-transform-service-new-with-acs-61
Alfresco provides various content transformation options across the Digital Business Platform (DBP). In this talk, we will explore the new independently-scalable Alfresco Transform Service. This enables a new option for transforms to be asynchronously off-loaded by Alfresco Content Services (ACS).
https://devcon.alfresco.com/speaker/jan-vonka/
Features of Alfresco Search Services.
Features of Alfresco Search & Insight Engine.
Future plans for the product
---
DEMO GUIDE
[1] Queries: Share > Node Browser
ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'
SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')
[2] Queries: Share > JS Console
var ctxt = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var searchService = ctxt.getBean('SearchService', org.alfresco.service.cmr.search.SearchService);
var StoreRef = Packages.org.alfresco.service.cmr.repository.StoreRef;
var SearchService = Packages.org.alfresco.service.cmr.search.SearchService;
var ResultSet = Packages.org.alfresco.repo.search.impl.lucene.SolrJSONResultSet;
ResultSet =
searchService.query(
StoreRef.STORE_REF_WORKSPACE_SPACESSTORE,
SearchService.LANGUAGE_FTS_ALFRESCO,
"ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'");
logger.log(ResultSet.getNodeRefs());
---
var ctxt = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var searchService = ctxt.getBean('SearchService', org.alfresco.service.cmr.search.SearchService);
var StoreRef = Packages.org.alfresco.service.cmr.repository.StoreRef;
var SearchService = Packages.org.alfresco.service.cmr.search.SearchService;
var ResultSet = Packages.org.alfresco.repo.search.impl.lucene.SolrJSONResultSet;
ResultSet =
searchService.query(
StoreRef.STORE_REF_WORKSPACE_SPACESSTORE,
SearchService.LANGUAGE_CMIS_ALFRESCO,
"SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')");
logger.log(ResultSet.getNodeRefs());
---
var def =
{
query: "ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'",
language: "fts-alfresco"
};
var results = search.query(def);
logger.log(results);
[3] Queries: api-explorer
{
"query": {
"language": "afts",
"query": "ASPECT:\"cm:titled\" AND cm:title:\"*Sample\" AND TEXT:\"code\""
}
}
---
{
"query": {
"language": "cmis",
"query": "SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')"
}
}
[4] Queries: CMIS Workbench > Groovy Console
rs = session.query("SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')", false)
for (res in rs) {
println(res.getPropertyValueById('cmis:objectId'))
}
[5] Queries: SOLR Web Console > (alfresco) > Query
/afts
ASPECT:'cm:titled' AND cm:title:'*Sample*' AND TEXT:'code'
---
/cmis
SELECT * FROM cm:titled WHERE cm:title like '%Sample%' AND CONTAINS('code')
---
Alfresco Web Scripts have become an important part of any Alfresco developer's tool kit and in this session we will take a deep dive into how Web Scripts can be used to provide public APIs for Alfresco extensions. After briefly reviewing the anatomy of a Web Script and discussing Alfresco's approach to Service development, we will work through an example that extends Alfresco with a simple service and creates a REST API using Web Scripts.
How to migrate from Alfresco Search Services to Alfresco SearchEnterpriseAngel Borroy LĆ³pez
Ā
Presentation on how to move from the Alfresco Search Services product based in Apache Solr to the new Alfresco Search Enterprise integrated with Elasticsearch and Amazon Opensearch.
Alfresco has provided an implementation of CMIS ever since the first draft of the specification was announced. It is the CMIS repository that all others are compared to. In this session, you'll learn how Alfresco maps to the CMIS domain model and explore how CMIS services such as query behave through live examples. You'll see how easy it is to build applications against CMIS including the use of unique Alfresco features such as Aspects.
This is the session delivered during the Alfresco Developers Conference in Lisbon, January 2018. Learn all what you need to know to perform a proper backup and disaster recovery strategy. From a single server installation with hundreds of documents to a large deployment with multiple nodes, layers, databases and multi-million documents. What is the best way for each case?
Infrastructure, use cases and performance considerations for
an Enterprise Grade ECM implementation up to 1B documents on AWS (Amazon Web Services EC2 and Aurora) based on the Alfresco (http://www.alfresco.com) Platform, leading Open Source Enterprise Content Management system.
BeeCon 2017 (Zaragoza)
https://www.youtube.com/watch?v=6pxYndS-x9A
http://beecon.buzz
A lot of effort has been put into creating a useful set of RESTful APIs in the Alfresco Content Services 5.2 release and beyond. In this talk we'll cover those new APIs and have a sneak peek at what's coming next, including Process, Governance and Search Services.
An overview of the new and enhanced APIs will be discussed and some of the key endpoints demonstrated via Postman. By the time you leave you should have enough knowledge to consume the APIs in your favourite programming language and create a simple client.
These APIs will be the foundation of all new clients developed by Alfresco so this is a must attend session!
CUST-10 Customizing the Upload File(s) dialog in Alfresco ShareAlfresco Software
Ā
Many Alfresco projects require customizations to the Share user interface that go beyond the normal configuration. This usually involves changing/overriding Repository Web Scripts and Surf Web Scripts, updating JavaScript and CSS files, coding with the Yahoo UI Library, etc. This session will customize the Alfresco Share Upload File(s) dialog and show you how to: Add Widgets to the Upload File(s) dialog, Override Surf Web Scripts, Override/Update JavaScript and CSS files, Write Repository Web Scripts, Call Web Scripts from Yahoo UI Library code, and Setup a build project for these customizations. This session will present the advanced customization concepts via hands-on tutorial and slides.
MongoDB World 2019: Finding the Right MongoDB Atlas Cluster Size: Does This I...MongoDB
Ā
How do you determine whether your MongoDB Atlas cluster is over provisioned, whether the new feature in your next application release will crush your cluster, or when to increase cluster size based upon planned usage growth? MongoDB Atlas provides over a hundred metrics enabling visibility into the inner workings of MongoDB performance, but how do apply all this information to make capacity planning decisions? This presentation will enable you to effectively analyze your MongoDB performance to optimize your MongoDB Atlas spend and ensure smooth application operation into the future.
Deploying any software can be a challenge if you don't understand how resources are used or how to plan for the capacity of your systems. Whether you need to deploy or grow a single MongoDB instance, replica set, or tens of sharded clusters then you probably share the same challenges in trying to size that deployment.
This webinar will cover what resources MongoDB uses, and how to plan for their use in your deployment. Topics covered will include understanding how to model and plan capacity needs for new and growing deployments. The goal of this webinar will be to provide you with the tools needed to be successful in managing your MongoDB capacity planning tasks.
Some of the most common questions we hear from users relate to capacity planning and hardware choices. How many replicas do I need? Should I consider sharding right away? How much RAM will I need for my working set? SSD or HDD? No one likes spending a lot of cash on hardware and cloud bills can just be as painful. MongoDB is different from traditional RDBMSs in its resource management, so you need to be mindful when deciding on the cluster layout and hardware. In this talk we will review the factors that drive the capacity requirements: volume of queries, access patterns, indexing, working set size, among others. Attendees will gain additional insight as we go through a few real-world scenarios, as experienced with MongoDB Inc customers, and come up with their ideal cluster layout and hardware.
The venerable Servlet Container still has some performance tricks up its sleeve - this talk will demonstrate Apache Tomcat's stability under high load, describe some do's (and some don'ts!), explain how to performance test a Servlet-based application, troubleshoot and tune the container and your application and compare the performance characteristics of the different Tomcat connectors. The presenters will share their combined experience supporting real Tomcat applications for over 20 years and show how a few small changes can make a big, big difference.
Oracle Open World (OOW) 2014 presentation on Oracle Cache Fusion; how it works and how to use it in an optimized fashion to scale an Oracle RAC system.
1 1/2 years ago we have rolled out a new integrated full-text search engine for our Intranet based on Apache Solr. The search engine integrates various data sources such as file systems, wikis, internal websites and web applications, shared calendars, our corporate database, CRM system, email archive, task management and defect tracking etc. This talk is an experience report about some of the good things, the bad things and the surprising things we have encountered over two years of developing with, operating and using a Intranet search engine based on Apache Solr.
After setting the scene, we will discuss some interesting requirements that we have for our search engine and how we solved them with Apache Solr (or at least tried to solve). Using these concrete examples, we will discuss some interesting features and limitations of Apache Solr.
In the second part of the talk, we will tell a couple of "war stories" and walk through some interesting, annoying and surprising problems that we faced, how we analyzed the issues, identified the cause of the problems and eventually solved them.
The talk is aimed at software developers and architects with some basic knowledge about Apache Solr, the Apache Lucene project familiy or similar full-text search engines. It is not an introduction into Apache Solr and we will dive right into the interesting and juicy bits.
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
Ā
Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
Ā
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
Ā
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more āmechanicalā approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
Ā
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Ā
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as āpredictable inferenceā.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
Ā
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. Whatās changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
Ā
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
Ā
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
Ā
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties ā USA
Expansion of bot farms ā how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks ā Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
2. Agenda
ā¢ Who am I
ā¢ Presentation Objectives
ā¢ Sizing Key questions
ā¢ Sample Architecture
ā¢ Sizing the different application layers
3. About me
ā¢ Luis Cabaceira, Technical Consultant at
Alfresco
ā¢ 13 years of field experience
ā¢ Helped clients in various fields on
architecture and sizing decisions
Government &
Intelligence
Banking &
Insurance Manufacturing
Media &
Government & Banking &
Manufacturing Media Publishing
&
Intelligence
Insurance
Publishing
High Tech
4. Objectives of the presentation
ā¢ Explain sizing key questions that will drive
sizing decisions
ā¢ Some Architecture best practices
ā¢ Explain the impact of system usage on the
different application layers
5. Considerations
ā¢ The information provided on this presentation
is based on Alfresco internal benchmarks as
well as our field experience in various
customer implementations.
ā¢ Sizing predictions should be validated by
executing dedicated benchmarks.
ā¢ Monitoring your solution will also help on
validating your sizing predictions allowing for
incremental and continuous improvement
7. 3 required sizing topics
ā¢ The most important topics are
ā¢ Use Case
ā¢ Concurrent Users
ā¢ Repository documents
ā¢ In many cases organizations are not able
to provide the answers to the other
questions and you have to build
assumptions.
11. Extra sizing topics
The more information you have about your
scenario the more accurate will be your
sizing predictions.
When available you should get information
on the topics explained on the next slides to
complete the information gathering stage.
21. Database Sizing
All operations in Alfresco require a database
connection, the database performance plays a
crucial role on your Alfresco environment. Itās vital to
have the database properly sized and tuned for
your specific use case.
Content is not stored in the database
Database size is un-affected by size or the document's content
Database size is affected by the number/type of metadata fields
22. Database Size calculation
D = Number of Documents + Avg. number of
versions
DA = Average number of Metadata fields per
document
F = Number of Folders
FA = Average number of Metadata fields per Folder
Dbsize = ((D * DA) + (F*FA)) * 5.5k
Add 2k for each user and 5k for each group
25. User facing nodes sizing
The repository user-facing cluster is dedicated on
serving the usual user requests (reads, manual writes
and updates).
The most important factor that stresses these servers
is the number of concurrent users hitting the cluster.
Note that these nodes also do transformations to swf
when a user accesses the document preview. User
Uploads will also create thumbnail images.
26. User facing nodes sizing factors
Operations: Percentage of browse, download and
write operations
Components, Protocols and API: Which components,
protocols and APIs of the product are the users using
to access these nodes.
Response Times: The detail around response time
requirements.
27. User facing nodes calculations
Considering our field experience and our internal
benchmark values and assuming we assign a quad-core
cpu to the node with a user think-time of 30s
1 alfresco node for each 150 concurrent users.
Number of Nodes = N Concurrent Users / 150
If we assign 2 quad-core cpus to each node.
1 alfresco node for each 300 concurrent users.
Number of Nodes = N Concurrent Users / 300
28. User facing nodes calculations
By default Alfresco has only 1 thread configured for
libreoffice, but the number of threads can be
increased using the port numbers on jodConverter.
If have lots of uploads and you need faster
(concurrent) transformations, consider raising the
number of libreoffice threads. Have this in
consideration on your memory calculations for these
servers, it takes typically 1GB for each Libreoffice
thread. Transformation server can be an option.
30. Dedicated tracking nodes sizing
The dedicated tracking nodes exist on each Solr
server and they are the only reference to Alfresco that
Solr knows about. Note that they normally have their
own JVM and application server instance.
The dedicated tracking Alfresco instances perform.
ā¢ Text extraction of the documentās content to send to
Solr. (Alfresco api call)
ā¢ Allow for local indexing (documentās do not have to
travel over the wire during tracking)
31. Dedicated tracking nodes sizing
Normally these nodes do not have high requirements
on memory and CPU but sizing will depend on the
number of expecting uploads and transactions on the
system.
Indexing (Text extraction) impacts CPU and itās API
call that occurs exclusively on these nodes.
33. Solr nodes sizing factors
Authority Structure
Recent benchmarks comparisons that authority
structure has a direct and important impact on
performance of SOLR.
When sizing Solr we should analyze the types of
searches that will be executed and the authority
structure of the corresponding use case
34. Solr nodes sizing factors
Repository Size
Important to notice that repository sizes should
mostly only affect average response times for
search operations, and more importantly for certain
kinds of global searches very sensitive to the
overall size of the repository.
This is the layer most affected by the increase on
repository size.
35. Solr nodes Calculations
Use the online formula published by alfresco to
calculate the necessary memory for your solr
nodes.
Note that the formula provided gives results for
each core and it considers only one searcher.
There can be up to 2 searchers per core, so the
values you get from the formula you should multiply
by 4.
Total Memory = Result x 2 cores x 2 searchers
36. Solr nodes Calculations
To calculate the number of Solr nodes on your
cluster you should consider the search response
times, the frequency of the search requests and the
size of your repository.
There are several possible architectures for solr
and corresponding sizing scenarios depending on
your use case, also tuning plays an important role.
http://blogs.alfresco.com/wp/lcabaceira/2014/06/20/
solr-tuning-maximizing-your-solr-performance/
37. Solr Indexes
You can expect the size of your indexes (when
using FTI) to be between 40-60% the estimated
size of your content store.
39. Bulk ingestion nodes sizing
Sizing the nodes dedicated to content ingestion is
very similar to sizing the tracking nodes, these nodes
do not take any user requests and donāt receive
search requests but they do generate document
thumbnails. Start with 1 cpu (quad-core).
They are mainly database and solr proxys. They read
xml files and there is few memory consumption.
The load on this server will mostly impact Solr
indexing and the dedicated tracking instances (text
extraction).
40. Bulk ingestion nodes tuning
Disable unneeded services, many standard services
are not required when bulk loading content
Verify the existence of rules on target folder that can
delay the speed of the ingestion and disable when
applicable.
Minimize network and file I/O operations
Get source content as close to server storage as
possible
Tune JVM, Network, Threads, DB Connections.
42. Content Store Sizing
To size your content store consider
ā¢ Number of documents (ND)
ā¢ Number of Versions (NV)
ā¢ Average document size (DS)
ā¢ Annual Growth rate (AGR)
Content Store Size (Y1) = (ND * NV * DS)
If you have thumbnails and previews enabled also
consider the pdf and swf versions of each
document and the documents thumbnails
43. Sample Real Life numbers
SIZING AREA INFORMATION
USE CASE Collaboration
CONCURRENT USERS 600
REPOSITORY 4M docs/ Avg doc size is 3mb/ Ingestion Rate is 75000 docs /Year
ARCHITECTURE High availability required on Solr and Alfresco user facing nodes
AUTHORITY STRUCTURE 8000 named users, 100 groups, user belongs to 4 or 5 groups,
Group hierarchy max depth 4.
OPERATIONS Search/Write/Browse split is 10/10/80
COMPONENTS PROTOCOLS
Alfresco Share + DM + Sites.
AND APIs
SPP, Integration with AD.
CUSTOMIZATIONS IMPACT None
BATCH OPERATIONS Ad Synchronization, External loaded dictionary data
RESPONSE TIMES Not Defined
44. Sample Real Life numbers
ALFRESCO NODES CPU JVM MEMORY
2 2 Quad-Core 8 GB
SOLR NODES CPU JVM MEMORY
2 1 Quad-Core 20GB
45. 2 Common use cases
Collaboration Backend Repository
46. Collaboration vs Backend Repo
Collaboration Scenario
ā¢ Search is usually just a
small portion of the
operations percentage
(around 10%)
ā¢ User authority structure
will be complex
ā¢ Most uploads are
manually driven
Backend Scenario
ā¢ In most of the cases
especially for very large
repositories there wont be
full text indexing/search
ā¢ Authority structures will
be in general fairly
simple.
ā¢ Dedicated nodes for
ingestion are normally
used.
47. Collaboration vs Backend Repo
Collaboration Scenario
ā¢ Repository Sizes are
usually of small or
intermediate size.
ā¢ Customizations in most
cases will concentrate at the
front end (Share).
ā¢ Architecture options are in
general the standard ones
provided by Alfresco
(cluster, dedicated
index/transformation layers,
etc).
Backend Scenario
ā¢ Repository sizes are usually
quite big.
ā¢ Custom solution code may
live external to Alfresco by
using CMIS, public APIs,
etc.
ā¢ Architecture options are
normally more complex
using proxies, cluster and
unclustered layers, sharding
of Alfresco repository, etc
48. Collaboration vs Backend Repo
Collaboration Scenario
ā¢ Concurrent users are
normally quite high
ā¢ Share interface is used,
very common usage of
SPP, CIFS, IMAP,
WebDAV and other public
interfaces (CMIS) for
other interfaces (mobile).
Backend Scenario
ā¢ Concurrent users are in
general fairly small but
think times are much
smaller than for
collaboration.
ā¢ Most of the load should
concentrate around public
API (CMIS) and custom
developed REST API
(Webscripts).
49. Collaboration vs Backend Repo
Collaboration Scenario
ā¢ Batch operations should
mostly be around human
interaction workflows and
the standard Alfresco
jobs.
Backend Scenario
ā¢ Batch operations will
usually have a
considerable importance,
including content injection
processes (bulk or not),
custom workflows and
scheduled jobs.
50. Database Thread Pool
A default Alfresco instance is configured to use up to
a maximum of forty (40) database connections.
Because all operations in Alfresco require a database
connection, this places a hard upper limit on the
amount of concurrent requests a single Alfresco
instance can service (i.e. 40), from all protocols.
Alfresco recommends to increase the maximum size
of the database connection pool to at least [number of
application server worker threads] + 75.
51. Database Scaling
Alfresco relies largely on a fast and highly
transactional interaction with the RDBMS, so the
health of the underlying system is vital.
If your project will have lots of concurrent users and
operations, consider an active-active database cluster
with at least 2 machines.
53. User facing nodes Golden Rules
Browsing the repository using a client (share or
explorer) performs Acl checks on each item present
on the folder being browsed. Acl checks go against
the database and occupy a thread in tomcat, causing
cpu consumption. Consider having a max of 1000
documents on each folder to reduce this overhead.
Add a transformations timeout and configure the
transformation limits to prevent threads spending to
much time doing transformations.
54. User facing nodes Golden Rules
User operations take application server threads. The
more concurrency the more threads will be occupied
and the more cpu will be used. Size the maximum
number of application server threads (and the number
of cluster nodes) according to your expected
concurrency.
Include the Alfresco scheduled tasks on your cpu
calculations, those also require server threads and
consume cpu.
55. User facing nodes Golden Rules
Customizations on the repository should be carefully
analyzed for memory and cpu consumption. Check
for unclosed resultsets, they are a common memory
leak.
The local repository caches (L2 caches) occupy
server memory, size them wisely. Check my blog post
on the alfresco repository
caches.http://blogs.alfresco.com/wp/lcabaceira/2014/
05/29/repository-caches/
56. Golden Rules ā Solr memory
To minimize memory requirements on solr :
ā¢ Reduce the cache sizes and check the cache hit
rate.
ā¢ Disable ACL checks.
ā¢ Disable archive indexing, if you are not using it.
ā¢ Since everything scales to the number of
documents in the index, add the Index control
aspect to the documents you do not want in the
index
57.
58. Capacity of Alfresco Repository
How do can determine the throughput of my
Alfresco repository server ?
A ECM repository is very similar to a database in
regards to the type of events that they manage
and execute, specially when we think about
ātransactions per secondā where a
ātransactionā is considered to be a basic
repository operation (create, browse, download,
update , search and delete) .
59. The C.A.R.
āThe maximum number of transactions that can be
handled in a single second before degrading the
expected performanceā
60. Capacity of Alfresco Repository
3 more figures
EC = The expected concurrency represented in
number of users.
TT = user think time represented in seconds
ERT = Expected/Accepted response times
61. The Expected Response Times
OPERATION VALUE WEIGHT LAYER CPU MEM I/O
Download 3 sec 20 Repo
Write/Upload 5 sec 10 Repo/Solr/DB
Delete 3 sec 5 Repo/DB
Browse/Read 2 sec 50 Repo/Solr/DB
Search Metadata 2 sec 10 DB/Solr
Full Text Search 5 sec 5 Solr
Itās an object representing the response times being
considered including the weight of each type and the
layer that it affects. It takes known repository actions
as arguments.
62. Capacity of Alfresco Repository
With the inclusion of new variables we can tune
our C.A.R. definition and redefine it as.
āNumber of transactions that that the server can
handle in one second under the expected
concurrency(EC) with the agreed Think Time (TT)
ensuring the expected response times(ERT) ā.
By configuring the Alfresco audit trail we can
define our initial throughput (average transactions
per seconds).
63. Predicting the future
Re-evaluate your sizing introducing new factors.
Take a dynamic / use case oriented approach to
identify a formula, built on a system of attributes,
values, weights and affected areas.
Introduce one or more ERT objects representing
use case specific response times and operations
acting as influencers on the server throughput.
64. Simple lab tests results
Weāve executed some lab tests, one server running Alfresco and another running
the database, simple collaboration use case on a repository with 5000 documents.
Alfresco Server Details
ā¢ Processor: 64-bit Intel Xeon 3.3Ghz (Quad-Core)
ā¢ Memory: 8GB RAM
ā¢ JVM: 64-bit Sun Java 7 (JDK 1.7)
ā¢ Operating System: 64-bit Red Hat Linux
Test Details
ā¢ ERT = The sample ERT values shown before on the presentation
ā¢ Think Time = 30 seconds.
ā¢ EC = 150 users
The C.A.R. of the server was between 10-15 TPS during usage peaks. Through JVM
tuning along with network and database optimizations, this number can rise over 25
TPS.
67. Additional Info
Alfresco Blog
http://blogs.alfresco.com/wp/lcabaceira/
Github page
https://github.com/lcabaceira
Thank you
Editor's Notes
Predicting and sizing a system depends on a good understanding of user behavior and how that behavior affects the various layers of your infrastructure.
Performing a regular analysis to the monitorization/capacity planning data will help you know exactly when and how you need to scale our architecture. The more data that gets indexed inside elastic search along the application life cycle the more accurate your capacity predictions will be. They represent the ārealā application usage and how that usage affects the various layers of your application. This plays a very important role when modeling and sizing the architecture for future business requirements.
Miguel Rodriguez is presenting on how to monitor Alfresco with open source tools, I consider this a very robust solution that you should adopt on your project as it will help you to adjust your sizing predictions based on real data. Iāve also wrote a blog post on how to monitor your alfresco solution with opensource tools.
http://blogs.alfresco.com/wp/lcabaceira/2014/09/20/monitoring-your-alfresco-solution/
The best way to perform a sizing exercise is to ask the right questions and gather relevant information that can support the sizing decisions.
Gathering this information should be the first step in order to do an Alfresco-sizing proposal, next I will explain the 3 most important factors to consider while gathering sizing information.
Without the information on those 3 topics is virtually impossible to execute a sizing exercise. Itās useful to build sensible assumptions that provide approximate answers to the sizing input not provided by the organization.
At this stage you should gather all information related with how the solution is intended to be used by final end users. Understanding the purpose of the system is critical to avoid providing values that does not take into account the many subtle (and not so subtle) details that may be hidden behind the rest of the information provided.
You may find out that some solutions aggregate in fact many different use cases that are fairly different and independent making the sizing estimate effort more difficult.
Those situations may also be hard to optimize from a configuration but also from an architecture point of view. For optimized architectures it may be considered as an important improvement to separate those different solutions and use cases into different separated deployments that may be sized also in an independent dedicated manner.
The concurrent users information is definitely on the top of the importance rank of information to gather for a proper sizing. You will need to size it in a completely different manner, systems with very different concurrency even if they share the same basic architecture skeleton. On the other hand different concurrencies can even demand different architectures.
Number of concurrent users ( including system users )
Consider detail around average think-time for the number of concurrent users defined
Details on average and peak values.
People often mean considerably different things when they talk about concurrent users. So some agreement is needed on the definition of concurrent user. Generically we consider here that a system with an average of N concurrent users will mean that in average for a period of time (30 to 100s think-time) the system will receive requests from N different users. So the term of concurrent user to have meaning needs to have it clarified the think-time being considered. Smaller or bigger think-times being discussed will demand in general an extrapolation of the value to compare to reference cases.
How many documents in the repository ? What types of documents will be used ? What are the average document sizes ? Distribution ratio on the various types ? Injection Rate: the repository growth or document upload rate. The document types and sizes will impact transformation, text extraction and indexing. The repository size has impact on performance will grow with the ratio of search operations expected on the solution.
Repository size is also dynamical, and the project may expect even on the near term to go through different phases: first migrating legacy repository, new content roll out, archival, etc. So sizing estimates should cover the different phases especially on short term.
The injection rate has an obvious impact on the server concerning near future repository sizes, but also around the capability of the different solution layers on handling the throughput that not only stress the content storage and database (metadata extraction/upload/rules) but also the indexing layer. This may imply depending on the amount of document injection happening that dedicated nodes are reserved for the injection (that can be on cluster or not with the front end service nodes) and also Solr layer scaled up/out.
The requirements around uploading and downloading large or small documents, and indexing, transforming different types of documents will vary. This can have architecture consequences. Preference for certain type of protocols and mechanisms for large files for example: FTP, bulk ingestion; or use of dedicated caching solutions for large documents downloads. But this also has consequences at the sizing level. It will be very different having to index large documents than smaller ones, and also between different types.
Next I will explain the rest of the factors to consider (when possible ) while gathering sizing information. That information helps on the accuracy of the sizing forecast. If those are not available you should make some assumptions and write those assumptions on your sizing document.
Some typical questions to answer are:
Is high availability needed? Cluster or failover mechanism is enough? Where is load balancing needed?
Is there a need for a dedicated transformation server? Virtualization used ?
Is there a need for independent layers/clusters for certain interfaces (injection, etc)?
Are there reasons to consider a smaller independent layer for Alfresco Share?
Should dedicated caching elements be introduced?
Architecture expectation will constrain what needs to be sized but also will need to get reviewed according to sizing arising from all the different parts of information gathering and analysis.
Do not expect the architecture to be the result of one direction information flow from the organization to you. Most probably the organizations will expect a coordinated answer (proposal/feedback) on both: architecture and sizing.
How our user identities organize themselves from a permission control point of view. Basically, the named users and groups, and subgroups, structure. Number of Users, Number of user groups, Group tree/structure depth, Average number of users Per Group , Average number of Groups per user
Consider a fairly complex authority structure, one with considerable number of hierarchy levels in the group structure with the users belonging to many different groups and Consider a a very simple authority structure one with a very reduced number of users (most system users) with each user belonging to a fairly small number of groups (5 or less).
Real users corresponding to organization charts usually have a complex authority structure (teams, projects, departments, etc). This is a typical scenario for collaboration use cases. While cases where Alfresco is used as backend without permission control responsibilities will have in general the most simple authority structures.
We know from recent benchmarks comparisons that authority structure has a direct and important impact on performance of SOLR especially. So its important when sizing SOLR the importance of search operations for the solution, the types of searches being executed, the repository size and characteristics but also and equally important the authority structure of the corresponding use case. Collaboration use cases will so be in general more demanding in principle (keeping the other mentioned factors constant) than backend use cases with simpler authority structures.
Split ratio between browse/download/search/Write, Which types of search operations will be executed (global, full text, custom search), Time window frames for the operations execution. Averages and peaks
Browse are āpureā read operations that do not execute searches. They wonāt stress the index/search layer and basically will access the database. Each of these operations types will stress different parts of our architecture in a different extent.
Search will stress the Solr (or Lucene) layer. Indexing (especially Solr that its not atomic) will in general stress the database but also has a considerable impact at the search level, concurring with search operation for the same resources (disk and memory).
Write/Download should stress the repository/database.
The fact you have 50% of Search operation for example will demand a much bigger sized Solr layer than if you would have in general only 5% of searches. If the write operations ratio is comparable to the reads and downloads we are most of the times talking about cases of massive injection (image capturing, scanning, etc), and those cases may demand a separate dedicated injection layer as already mentioned.
For most of the cases we would expect that read/browse and download operations are the lighter ones. We already discussed the write operations specific case for the Injection Rate topic.
Search operations will also differ from one another and special care should be taken around the type of search operations being executed. Many full text global searches against very large repositories with complex authority structures will certainly have a very important impact on your Solr layer sizing. But content models with few untokenized metadata fields may have better response times, even against a considerably large repository. Finally remember when analyzing the information gathered around operations being executed, the time windows for the execution of those operations. So, as an example, saying a system will have an average of 2000 docs downloaded a day can have quite a different meaning if complemented by the fact this will happen mostly during the first 5 morning hours of the day.
Which components of Alfresco will be used ? Which APIs will be used. Which protocols will be used ?
Some protocols will be more demanding than others and require a more scaled up or scaled out system to provide acceptable response times. In general some attention should be paid to some problematic protocols. In general this is not Alfrescoās fault but because of the protocols definitions themselves. From load point of view CIFS and IMAP are possibly good examples of protocols that affect performance. On the other hand other protocols and interfaces should be considered reasonably light in comparison like FTP and WebDAV.
This topic is in most of the cases (of sizing work) more a matter of proper guiding the organization to the correct and balanced solution, from the combined point of view of their requirements at the functional and sizing level.
Batch operations are especially important for customized cases, where new processes are being executed related to custom workflows, custom scheduled jobs, custom rules and transformations, etc. Itās also very hard to have sensitivity at initial stages what will the future custom batch operations be. In any case for those situations when we already have information about it (solution analysis by organization is already advanced) itās an important topic to take into account.
If it is considered important enough, batch jobs might be configured to execute on dedicated nodes/layers in order to minimize the effect on the other service layers. Also care should be taken around customizations to use proper Alfresco cluster job lock service to avoid jobs execution repetition between nodes in a cluster.
This is definitely a very hard point to measure on initial phases of the project and the best candidate to strong requirements around proper dedicated benchmark and load tests on later phases. And at the same time experience also tell us that it happens too frequently that itās in fact the most important point damaging the correct performance of a system.
Besides all the common warnings around best practices on Alfresco coding, preventive expert code review, unit and load systematic tests, training/certification, etc.
The Alfresco expert responsible for sizing should pay attention to the most probably critical points that may affect the implementation performance in the future considering whatās known from the solution requirements. And most especially contextualize the proposed sizing on the adequate terms concerning optimized custom code or (if it really needed) extra scaling up the sizing for taking into account a very possible optimized custom code solution. In any case for very customized scenarios all sizing should be considered really as first test environments setups sizing, and any final sizing for production environments should come from dedicated benchmark results.
Details around planned customizations
Customizations impact analysis
Details on external system integrations.
If thereās already developed code evaluation on the code performance and possible optimization of the same code.
Last but not least, we need to know the expectations around response time for the different operations being executed. And specifically the important value from sizing point of view will be in general the required response time for peak concurrency (also check the details on response times for the different operations).
From a scalability point of view there might exist requirements around response time degradation with the variation of the different dynamical factors that constrain our solution: concurrent users, but also: repository size, architecture components, etc. Those types of requirements may demand from you a more scaled up/out sizing to cope with the near term possible variations that may affect in a negative way your solution and response times. The same requirements may also demand specific architecture solutions to cope with them.
There is no perfect formula that can be applied to determine the appropriate sizing for every scenario. Sizing and capacity planning depend of several important factors that change from project to project such as:
Number and type of customizations
External systems integrations
User Behavior
Network topology and infrastructure specific factors
Only with dedicated benchmarks over the final solution on proper setup test environment (reproducing as accurate as possible production conditions, on functional and load terms) its possible to provide empirically tested values.
The architecture on the diagram follows some of Alfresco best practices . It shows a horizontally scalable architecture with Alfresco user facing nodes (2 or more), dedicated tracking alfresco repositories local to the solr servers and a dedicated node for bulk ingestion of content. Check my blog post on solr tuning for architecture options in http://blogs.alfresco.com/wp/lcabaceira/2014/06/20/solr-tuning-maximizing-your-solr-performance/#comment-339
We will analyze each layer and provide the sizing variables relating those with the sizing information weāve gathered previously.
Be aware that:
Network communication with the database is very important, latency should be < 5ms.
Ā
Note that the formula are not taking in consideration additional space for logging, rollback, redolog, etc.(OOTB the number of meta-data fields in documents ins about 10 and folders is about 6)
Ā
The Alfresco database and the filesystem where the content store resides are the brain and heart of Alfresco. Those are the 2 most important layers of every Alfresco architecture, if they are not working well, nothing else will work well. If your project will have lots of concurrent users and operations, consider an active-active database cluster with at least 2 machines.
Most common throughput factor is transactions per second. DB performance in a transactional system are usually the underlying database files and the log file. Both are factors because they require disk I/O, which is slow relative to other system resources such as CPU.
In the worst-case scenario (big databases, big number of documents ):
Database access is truly random and if the database is too large for any significant percentage of it to fit into the cache, resulting in a single I/O per requested key/data pair.
Both the database and the log are on a single disk. This means that for each transaction, the Alfresco DB is potentially performing several filesystem operations:
Disk seek to database file
Database file read
Disk seek to log file
Log file write
Flush log file information to disk
Disk seek to update log file metadata (for example, inode information)
Log metadata write
Flush log file metadata to disk
Faster Disks normally can help on such sittuations but there are lots of ways (scale up, scale out) to increase transactional throughput.
In Alfresco the default RDBMS configurations are normally not suitable for large repositories deployments and may result into:
ā¢ I/O bottlenecks in the RDBMS throughput
ā¢ Excessive queue for transactions due to overload of connections
ā¢ On active-active cluster configurations, excessive latency
Alfresco Database treads pool.
Most Java application servers have higher default settings for concurrent access, and this, coupled with other threads in Alfresco (non-HTTP protocol threads, background jobs, etc.) can quickly result in excessive contention for database connections within Alfresco, manifesting as poor performance for users.
If tomcat is being considerer this value is normally 275. The setting is called db.pool.max and should be added to your alfresco-global.properties (db.pool.max=275).
http://docs.oracle.com/cd/E17076_04/html/programmer_reference/transapp_throughput.html
0. Concurrent users information is definitely on the top of the importance rank of information to gather for a proper sizing on the user facing nodes. People often mean considerably different things when they talk about concurrent users. An agreement is needed on the definition of concurrent user. Generically we consider here that a system with an average of N concurrent users will mean that in average for a period of time (30 to 100s think-time) the system will receive the requests from N different users. So the term of concurrent user to have meaning needs to have it clarified the think-time being considered. Smaller or bigger think-times being discussed will demand in general an extrapolation of the value to compare to reference cases.
1.Browse/Download/Write operations are those that do not execute searches. They basically will access the database and this database access takes cpu resources on this layer. The more Browse/Download and Write actions the more cpu power you need to have.
2. Components, Protocols and API: Some protocols will be more demanding than others and require a more scaled up or scaled out system to provide acceptable response times. In general some attention should be paid to some problematic protocols. In general this is not Alfrescoās fault but because of the protocols definitions themselves. From load point of view CIFS and IMAP are possibly good examples of protocols that affect performance. On the other hand other protocols and interfaces should be considered reasonably light in comparison like FTP and WebDAV.
3. Response Times: Depending on the expected response times you may have to adjust your sizing for this layer if they are to low.
This is also be depending on the use case, but itās a rule that normally apply to most use cases.
Whenever you scale out your cluster adding more nodes you should not expect a linear scalability and therefore that the double of the load on a 2 times bigger (more nodes) cluster will necessarily deliver the same response time. In fact in most of the cases you should expect some degradation of the response time arising from some cluster cache synchronization extra load (this effect has been much reduced on latest Alfresco 4 versions). So to compensate this effect, keeping the same response time, it is important in general also scale up a little each node in your cluster.
1 node = 1 Quad-core ( 3.2Ghz ā 64 bit )
The memory consumption is normally not very high on these nodes, start with a 6..8 GB heap and adjust it considering your monitoring data.
Remember that bigger heaps can lead to long lasting Garbage collections and some āstop-the-world pausesā if the garbage collector is not tuned for your use case.
You can also add more threads to Libreoffice allowing for some parallel work, this impacts the memory typically 1GB for each Libreoffice thread.
The best way to size these nodes is to start with a conservative approach for example 1 CPU quad-core and 8GB of RAM and to setup a proper monitoring scenario gathering the relevant information for any necessary adjustments.
These nodes offload important work for the user facing alfresco nodes and they do not consume a license as they donāt receive user requests. They are configured outside of the cluster/loadbalancer.
Dedicated alfresco tracking nodes serve as a database and solr proxy, they are mostly used for FTS transformations.
Sizing Factors
Operations: Percentage of write operations.
Document Details: types, sizes and distribution ratios
Memory is impacted by the number of documents and transactions on the documents.
Note that these nodes normally have their own JVM and application server instance, separated from the solr application server and JVM.
Memory is impacted by the Number of documents and transactions on the documents.
Collaboration use cases will so be in general more demanding in principle (keeping the other mentioned factors constant) than backend use cases with simpler authority structures.
What do we mean by Authority Structure? We mean how our user identities organize themselves from a permission control point of view. Basically, the named users and groups, and subgroups, structure. We will name as fairly complex authority structure, one with considerable number of hierarchy levels in the group structure with the users belonging to many different groups. On the opposite side we will say itās a very simple authority structure one with a very reduced number of users (most system users) with each user belonging to a fairly small number of groups (5 or less).
As we said real users corresponding to organization charts usually have a complex authority structure (teams, projects, departments, etc). This is a typical scenario for collaboration use cases. While cases where Alfresco is used as backend without permission control responsibilities will have in general the most simple authority structures.
For those cases the ones most affected will also be the ones for which the user-group authority structure is more complex, with users belonging to many groups in average.
If your case does not fit the scenarios just described a larger repository size wont probably affect the performance of your system in a such determinant manner. At the same time itās worth to notice that some organizations may impose performance requirements over average operation response times but also limit values for specific operations. For those cases even if search operations for example may represent just a small portion of operations executed they must be taken into account from sizing point of view in order to fulfill the organization requirements.
Ā
The solr nodes are the oneās that need more memory. Indexing impacts CPU usage, but the memory is the most impacted factor on these servers. Solr performance depends on the cache, bigger repositories require more memory (bigger caches)
Also take in consideration the search ratio, the formula is generic, you may have to adjust the memory based on your project search ratio.
Mention the possibility of having 2 servers, each indexing one core. Or several servers load-balancing the search request and each one holding a copy of the index with its own tracking server.Also a possibility is to use the new index replication feature , allowing for logical separation and independent tuning on the search and index layers of Solr.
Ā
When there are scheduled and frequent bulk ingestion processes it makes sense having a dedicated bulk ingestion server. Content ingestion consumes application server threads (and cpu) , having a dedicated bulk ingestion node offloads that work from the main (user facing nodes) maximizing resource usage for real user interactions. Memory consumption is generally low, normally a heap of 4G is enough.
No need to cluster caches from this server with the front end. Background index properties and/or content, indexes will catch up from DB transactions.
Benefits: No Cache update/invalidation overhead.
To exclude servers(s) from cluster, do not set cluster name for bulk load servers in alfresco-global.properties alfresco.cluster.name=
During the ingestion process we expect most of the load to occur on the database, the shared file system (I/O) and the user facing Alfresco nodes.
Rule of thumb: Prefer scale up over scale out as it provides simpler deployment and management.
Beware of network latency to RDBMS, Alfresco to Database is Key, Latency is key e.g. > 5ms is absolute max
Disable behaviours (Rules evaluations, cm:auditable, versioning, quotas (system.usages.enabled=false))
The performance of the storage where the content store resides is vital for the Alfresco overall performance.We recommend a fast storage (RAID5) shared with SAN and connected with the alfresco server via optical fiber or, as an alternative, 100GB Ethernet connection.Ā The high availability and scalability for the binary content storage layers is vendor dependent. We support different binary content storage for dedicated advanced file systems/servers.
swf is >10 * bigger than source (small text)
to
source is > 4 * bigger than swf (medium .doc)
Alfresco use cases vary considerably because of the elasticity of the platform. The details on which each real implementation differs may be very important from an architecture and sizing perspective.
Lets check the differences on 2 common use cases : Collaboration and Backend Repository
Generically we expect some differences between each of those generic cases but bear in mind that only the details of your use case will give you sensitivity on which aspects really matter for your case and which differences do not apply for you.
NOT FOR PRESS
------------------------
A correspondence between backend shorter think-times concurrent users with generic concurrent users (60 to 100s think-time) may be needed to compare to collaboration sizing references, although care should be taken since collaboration concurrency is usually more demanding (different users, many groups per user).
Most Java application servers have higher default settings for concurrent access, and this, coupled with other threads in Alfresco (non-HTTP protocol threads, background jobs, etc.) can quickly result in excessive contention for database connections within Alfresco, manifesting as poor performance for users.
If tomcat is being considerer this value is normally 275. The setting is called db.pool.max and should be added to your alfresco-global.properties (db.pool.max=275).
After increasing the size of the Alfresco database connection pool, you must also increase the number of concurrent connections your database can handle, to at least the size of the Alfresco connection pool. Alfresco recommends configuring at least 10 more connections to the database than is configured into the Alfresco connection pool, to ensure that you can still connect to the database even if Alfresco saturates its connection pool.
Version 5 will have new values for this.
Ā
Considering our existing customers, the biggest running repositories are under Oracle (most of them RAC).
Active-Active cluster can be achieved using Oracle RAC or a Mysql based solution with haproxy (opensource solution) or a commercial solution like MariaDB or Percona.
You can use either solution, depending also you the knowledge that you have in-house.
At this layer we donāt recommend to use any virtualization technology. The database servers should be physical servers.
The high availability and scalability for database is vendor dependent and should be addressed with the chosen vendor to achieve the maximum performance possible.
Next I present you with a series of Golden rules that you can apply on each of the layers weāve previously discussed
No document text extraction occur on these nodes. That operations are taken by the dedicated tracking alfresco on the Solr server, this is one of the reasons for having a dedicated tracking alfresco instance as it offloads the user facing alfresco for the text extraction work that can be resource consuming depending on the size of the documents.
To create documents previews and thumbnails Alfresco needs transform some of the content into little PNG files for preview purposes. When the Transformation server is not being used Alfresco is responsible for this transformations, for which it uses Open Office that can be heavyweight and resource consuming.
Transformations for previews and thumbnail generations, note that when the Alfresco transformation server is being used it will take care of all MS Office documents thumbnails and preview transformation but the rest of the mimetypes will still be handled by these nodes.
Transformations via PDF: Both oo.org and ImageMagik are used) thumbnails and SWF (via PDF: Both oo.org and pdf2swf.exe are used)
User operations take application server threads. The more concurrency the more threads will be occupied and the more cpu will be used. Size the maximum number of application server threads (and the number of cluster nodes) according to your expected concurrency.
Note that each thread normally holds a database connection, so concurrency impacts both alfresco cluster nodes and the database.
Find the exact number of nodes in the store (N), exact number of transactions in the repository (T), number of ACLs (A) and related ACL transactions in the repository (X).
Since everything scales to the number of documents in the index, add the Index control aspect to the documents you do not want in the index.
Aim is to define common standard figure that can empirically define the capacity of a repository instance.
Aim is to define common standard figure that can empirically define the capacity of a repository instance.
This methodology helps on defining a common standard figure that can empirically define the capacity of a repository instance.
The name (C.A.R.) stands for "Capacity of an Alfresco Repository"
Decreasing ERT means the necessity to increase the capacity of the alfresco repository server.
In average for a period of time ("The think time") the system will receive requests from N different users.
Fine-grain our definitions considering that āCreate/Updateā operations are most resource consuming than āRead and Deleteā on both repository and on the database server.
OOB Alfresco also provides feature to enable and/or disable selected sub actions for content auditing. This would be required when you want to perform track content auditing for specific action only.For example, configurations required to enable only READ audit action and basic audit logging in Alfresco 4.01.Ā Ā Ā Add below configurations in alfresco-global.propertiesĀ Ā Ā Ā Ā Ā Ā Ā Ā Ā audit.enabled=trueĀ Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā audit.alfresco-access.enabled=trueĀ Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā audit.alfresco-access.sub-actions.enabled=trueĀ Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā audit.filter.alfresco-access.transaction.action=READ;~updateNodeProperties;~addNodeAspect;~UPDATE CONTENT;~CREATE;~CHECK OUT;~CHECK INĀ Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā audit.filter.alfresco-access.transaction.sub-actions=readContent;~updateNodeProperties;~deleteNode;~createNode;~createContent;~checkOut;~copyNode2.Ā Ā Ā Add below configurations in log4j.propertiesĀ Ā Ā Ā Ā Ā Ā Ā Ā log4j.logger.org.alfresco.repo.audit.access.AccessAuditor=debugĀ Ā Ā Ā Ā Ā Ā log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=trace
The formula shifts and may be adapted with more ERT Objects that define the use case for fine tuned predictions.