In this talk we will provide an overview of the new great features and architectural options of Alfresco 4 around scalability, performance and benchmarking. With a solution oriented focus around the most common Alfresco large scale use cases, we will show the scalability and consistency implications of, amongst others, Apache SOLr integration, optional in-transaction indexing, redesigned permission checking and filesystem interfaces (e.g. CIFS) clustering. Finally we will also introduce objectives and practical results expected from the currently ongoing benchmark for Alfresco 4.
Alfresco is the largest open source content management company. It provides a content management platform, records management, web content services, and enterprise collaboration features. Alfresco can be deployed in scalable and reliable configurations including clustering servers, replicating content stores, and database clustering to improve performance and reduce points of failure. It also offers options for working outside the firewall such as a hosted cloud service.
The document discusses performance tuning of Alfresco. It covers JVM tuning including memory and garbage collection settings. It also discusses analyzing garbage collection logs and common problems. The document outlines different cache mechanisms in Alfresco including L1, L2 caches and Hazelcast caching. Tuning caches based on data change frequency and hit ratios is recommended. Finally, the document provides guidance on investigating performance issues by examining logs, threads, databases, storage and Alfresco/Solr configurations and settings.
The document discusses large scale deployments of Alfresco, an enterprise content management system. It addresses clustering different components like the application server, databases, content stores, and the application itself for scalability and reliability. The key components that need to be clustered are the application server, databases, content stores, and the application. Clustering these allows for load balancing and reduces single points of failure. Replicating content stores is also discussed as a way to synchronize content between stores.
The document provides an overview and best practices for tuning an Alfresco installation for performance. It discusses disabling unused services, limiting folder hierarchies and group nesting, monitoring resources, tuning Solr indexes and caches, and using separate servers for specific tasks like indexing. General tips include testing changes thoroughly before deploying, adjusting sizing for increased usage, and following the standard performance methodology.
In this session, we'll discuss architectural, design and tuning best practices for building rock solid and scalable Alfresco Solutions. We'll cover the typical use cases for highly scalable Alfresco solutions, like massive injection and high concurrency, also introducing 3.3 and 3.4 Transfer / Replication services for building complex high availability enterprise architectures.
Infrastructure, use cases and performance considerations for
an Enterprise Grade ECM implementation up to 1B documents on AWS (Amazon Web Services EC2 and Aurora) based on the Alfresco (http://www.alfresco.com) Platform, leading Open Source Enterprise Content Management system.
The document discusses an approach to addressing the "right to forget" requirement of the GDPR using an integrated solution with Alfresco, computer vision, and natural language processing. The solution includes a GDPR Watchdog subsystem that uses machine learning models to analyze content for personal data. It can detect information in images using computer vision and text using NLP. The subsystem exposes a GDPR service and integrates with Alfresco through a webscript and repository action. A demonstration of the solution is provided.
Sizing an alfresco infrastructure has always been an interesting topic with lots of unrevealed questions. There is no perfect formula that can accurately define what is the perfect sizing for your architecture considering your use case. However, we can provide you with valuable guidance on how to size your Alfresco solution, by asking the right questions, collecting the right numbers, and taking the right assumptions on a very interesting sizing exercise.
How many alfresco servers will you need on your alfresco cluster? How many CPUs/cores do you need on those servers to handle your estimated user concurrency? How do you estimate the sizing and growth of your storage? How much memory do you need on your Solr servers? How many Solr servers do you need to get the response times you require? What are the golden rules that can drive and maintain the success of an Alfresco project?
Alfresco is the largest open source content management company. It provides a content management platform, records management, web content services, and enterprise collaboration features. Alfresco can be deployed in scalable and reliable configurations including clustering servers, replicating content stores, and database clustering to improve performance and reduce points of failure. It also offers options for working outside the firewall such as a hosted cloud service.
The document discusses performance tuning of Alfresco. It covers JVM tuning including memory and garbage collection settings. It also discusses analyzing garbage collection logs and common problems. The document outlines different cache mechanisms in Alfresco including L1, L2 caches and Hazelcast caching. Tuning caches based on data change frequency and hit ratios is recommended. Finally, the document provides guidance on investigating performance issues by examining logs, threads, databases, storage and Alfresco/Solr configurations and settings.
The document discusses large scale deployments of Alfresco, an enterprise content management system. It addresses clustering different components like the application server, databases, content stores, and the application itself for scalability and reliability. The key components that need to be clustered are the application server, databases, content stores, and the application. Clustering these allows for load balancing and reduces single points of failure. Replicating content stores is also discussed as a way to synchronize content between stores.
The document provides an overview and best practices for tuning an Alfresco installation for performance. It discusses disabling unused services, limiting folder hierarchies and group nesting, monitoring resources, tuning Solr indexes and caches, and using separate servers for specific tasks like indexing. General tips include testing changes thoroughly before deploying, adjusting sizing for increased usage, and following the standard performance methodology.
In this session, we'll discuss architectural, design and tuning best practices for building rock solid and scalable Alfresco Solutions. We'll cover the typical use cases for highly scalable Alfresco solutions, like massive injection and high concurrency, also introducing 3.3 and 3.4 Transfer / Replication services for building complex high availability enterprise architectures.
Infrastructure, use cases and performance considerations for
an Enterprise Grade ECM implementation up to 1B documents on AWS (Amazon Web Services EC2 and Aurora) based on the Alfresco (http://www.alfresco.com) Platform, leading Open Source Enterprise Content Management system.
The document discusses an approach to addressing the "right to forget" requirement of the GDPR using an integrated solution with Alfresco, computer vision, and natural language processing. The solution includes a GDPR Watchdog subsystem that uses machine learning models to analyze content for personal data. It can detect information in images using computer vision and text using NLP. The subsystem exposes a GDPR service and integrates with Alfresco through a webscript and repository action. A demonstration of the solution is provided.
Sizing an alfresco infrastructure has always been an interesting topic with lots of unrevealed questions. There is no perfect formula that can accurately define what is the perfect sizing for your architecture considering your use case. However, we can provide you with valuable guidance on how to size your Alfresco solution, by asking the right questions, collecting the right numbers, and taking the right assumptions on a very interesting sizing exercise.
How many alfresco servers will you need on your alfresco cluster? How many CPUs/cores do you need on those servers to handle your estimated user concurrency? How do you estimate the sizing and growth of your storage? How much memory do you need on your Solr servers? How many Solr servers do you need to get the response times you require? What are the golden rules that can drive and maintain the success of an Alfresco project?
The objective of this article is to describe what to monitor in and around Alfresco in order to have a good understanding of how the applications are performing and to be aware of potential issues.
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaCloudera, Inc.
This talk will run through the list of filters that are shipped with HBase and show how they are used from a client application. Filters expose varying feature sets, but also exhibit an equally varying impact on read performance – but neither are directly intuitive. A skilled HBase practitioner should know how to select the proper filter for a given use-case, or how to combine sets of filters to achieve what is needed. The talk will conclude with an example for a custom filter and explain how to deploy it on a cluster.
Apache Impala is a complex engine and requires a thorough technical understanding to utilize it fully. Without proper configuration or usage, Impala’s performance becomes unpredictable, and end-user experience suffers. However, for many users and administrators, the right configuration of Impala is still a mystery.
Drawing on work with some of the largest clusters in the world, Manish Maheshwari shares ingestion best practices to keep an Impala deployment scalable and details admission control configuration to provide a consistent experience to end users. Manish also takes a high-level look at Impala’s query profile, which is used as a first step in any performance troubleshooting, and discusses common mistakes users and BI tools make when interacting with Impala. Manish concludes by detailing an ideal setup to show all of this in practice.
Speaker: Jean-Daniel Cryans (Cloudera)
HBase Replication has come a long way since its inception in HBase 0.89 almost four years ago. Today, master-master and cyclic replication setups are supported; many bug fixes and new features like log compression, per-family peers configuration, and throttling have been added; and a major refactoring has been done. This presentation will recap the work done during the past four years, present a few use cases that are currently in production, and take a look at the roadmap.
HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2017. It is the biggest and most exciting milestone release from the Apache community after 1.0. HBase-2.0 contains a large number of features that is long time in the development, some of which include rewritten region assignment, perf improvements (RPC, rewritten write pipeline, etc), async clients, C++ client, offheaping memstore and other buffers, Spark integration, shading of dependencies as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Existing users of HBase/Phoenix as well as operators managing HBase clusters will benefit the most where they can learn about the new release and the long list of features. We will also briefly cover earlier 1.x release lines and compatibility and upgrade paths for existing users and conclude by giving an outlook on the next level of initiatives for the project.
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
Apache Ambari manages Hadoop at large-scale and it becomes increasingly difficult for cluster admins to keep the machinery running smoothly as data grows and nodes scale from 30 to 3000 agents. To test at scale, Ambari has a Performance Stack that allows a VM to host as many as 50 Ambari Agents. The simulated stack and 50 Agents per VM can stress-test Ambari Server with the same load as a 3000 node cluster. This talk will cover how to tune the performance of Ambari and MySQL, and share performance benchmarks for features like deploy times, bulk operations, installation of bits, Rolling & Express Upgrade. Moreover, the speaker will show how to use Ambari Metrics System and Grafana to plot performance, detect anomalies, and pinpoint tips on how to improve performance for a more responsive experience. Lastly, the talk will discuss roadmap features in Ambari 3.0 for improving performance and scale.
The document discusses best practices for operating and supporting Apache HBase. It outlines tools like the HBase UI and HBCK that can be used to debug issues. The top categories of issues covered are region server stability problems, read/write performance, and inconsistencies. SmartSense is introduced as a tool that can help detect configuration issues proactively.
The document discusses several key topics in Apache HBase:
1. Procedure version 2 introduces a new framework for running operations like create/drop table and region assignment as procedures with distinct phases.
2. Assignment Manager version 2 uses procedures and improves region assignment and load balancing.
3. Backup/restore now supports HDFS, S3, ADLS and WASB. Snapshots can also be used for backup.
4. Compacting memstore allows in-memory flushing and compaction to improve performance through pipelining.
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
This document evaluates the performance of Cloudera Impala, an open-source SQL query engine for Apache Hadoop, and compares it to Apache Hive. It describes Impala's architecture and how the benchmark was conducted. The benchmark found Impala to be over 10 times faster than Hive for the modified TPC-H query, with the fastest Impala version taking 14.337 seconds compared to 164.161 seconds for Hive. The document concludes that future versions of Impala integrated with CDH5 may provide even better performance by supporting additional file formats.
(DAT309) Scaling Massive Content Stores with Amazon AuroraAmazon Web Services
John Newton, founder and CTO of Alfresco, describes how Amazon Aurora enables the Alfresco Content Management System to store, manage, and retrieve billions of documents and related information with fast and linear scalability. Using new techniques of information modeling, indexing, and processing with the recently launched Aurora database, Alfresco can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. This session addresses the challenges of scaling document repositories to this level; architectural approaches for coordinating data; search and storage technologies such as Aurora, Solr, Amazon EBS, and Amazon S3; the breadth of use cases that modern content systems need to support; and how to support user applications that require subsecond response times. The result is a solution that once would have required large data centers to support but can now be handled cost-effectively with AWS and Aurora.
The document summarizes the HBase 1.0 release which introduces major new features and interfaces including a new client API, region replicas for high availability, online configuration changes, and semantic versioning. It describes goals of laying a stable foundation, stabilizing clusters and clients, and making versioning explicit. Compatibility with earlier versions is discussed and the new interfaces like ConnectionFactory, Connection, Table and BufferedMutator are introduced along with examples of using them.
Speakers: Jingcheng Du and Ramkrishna Vasudevan (Intel)
As HBase continues to expand in application and enterprise or government deployments, there is a growing demand for storing data across geographically distributed datacenters for improved availability and disaster recovery. The Cross-Site BigTable extends HBase to make it well-suited for such deployments, providing the capabilities of creating and accessing HBase tables that are partitioned and asynchronously backed-up over a number of distributed datacenters. This talk reveals how the Cross-Site BigTable manages data access over multiple datacenters and removes the data center itself as a single point of failure in geographically distributed HBase deployments.
Impala's low-latency SQL queries for HDFS files motivated improvements to HDFS to better support Impala's needs. These included exposing block replica disk locations, allowing co-located block replicas, in-memory caching of hot files, and reduced data copying during reads. The changes helped Impala achieve significantly faster performance than Hive for queries, especially complex queries, by optimizing I/O and data locality.
My talk at ScaleConf 2017 in Cape Town on some tips and tactics for scaling WordPress, with reference to WordPress.com and the container-based VIP Go platform.
Video of my talk is here: https://www.youtube.com/watch?v=cs0DcY80spw
The document outlines topics covered in "The Impala Cookbook" published by Cloudera. It discusses physical and schema design best practices for Impala, including recommendations for data types, partition design, file formats, and block size. It also covers estimating and managing Impala's memory usage, and how to identify the cause when queries exceed memory limits.
This talk delves into the many ways that a user has to use HBase in a project. Lars will look at many practical examples based on real applications in production, for example, on Facebook and eBay and the right approach for those wanting to find their own implementation. He will also discuss advanced concepts, such as counters, coprocessors and schema design.
SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 - AUSPC2012Michael Noel
Using SQL Server 2012 AlwaysOn Availability Groups for failover of SharePoint 2010 Databases, as presented at the Australian SharePoint Conference - March 2012 in Melbourne.
Solr is an open source enterprise search platform built on Apache Lucene. It provides powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., word, pdf) handling. Solr powers the search capabilities of many large websites and is highly scalable, fault tolerant, and easy to use.
A market leader in technical training has organized a workshop on Oracle technology to help students build careers as engineers. The workshop will cover the physical and logical structure of Oracle databases, administrative tasks, features for managing storage, data, availability, security and performance. Specific topics will include backup methods, Oracle tasks like cold backups, information about Oracle Corporation, Automatic Storage Management, partitioning, XML capabilities and security products. The goal is to help students gain the knowledge and skills needed to work with Oracle databases.
JavaOne 2009 - Full-Text Search: Human Heaven and Database Savior in the CloudAaron Walker
This document discusses full-text search capabilities using Hibernate Search. It provides an overview of how Hibernate Search allows for full-text search across objects by transparently indexing them and integrating with Lucene. It also discusses how Hibernate Search enables querying, projections, indexing replication and sharding. The document then presents a case study of an automotive classifieds website that uses these Hibernate Search features for scalability on Amazon Web Services.
Alfresco is the largest open source content management company. It provides a content management platform, records management, web content services, and enterprise collaboration features. Alfresco can be deployed in scalable and reliable configurations including clustering servers, load balancing, database replication, and separate index servers to improve performance and reduce points of failure. It also offers options for working outside the firewall through a hosted cloud service.
The objective of this article is to describe what to monitor in and around Alfresco in order to have a good understanding of how the applications are performing and to be aware of potential issues.
HBaseCon 2012 | HBase Filtering - Lars George, ClouderaCloudera, Inc.
This talk will run through the list of filters that are shipped with HBase and show how they are used from a client application. Filters expose varying feature sets, but also exhibit an equally varying impact on read performance – but neither are directly intuitive. A skilled HBase practitioner should know how to select the proper filter for a given use-case, or how to combine sets of filters to achieve what is needed. The talk will conclude with an example for a custom filter and explain how to deploy it on a cluster.
Apache Impala is a complex engine and requires a thorough technical understanding to utilize it fully. Without proper configuration or usage, Impala’s performance becomes unpredictable, and end-user experience suffers. However, for many users and administrators, the right configuration of Impala is still a mystery.
Drawing on work with some of the largest clusters in the world, Manish Maheshwari shares ingestion best practices to keep an Impala deployment scalable and details admission control configuration to provide a consistent experience to end users. Manish also takes a high-level look at Impala’s query profile, which is used as a first step in any performance troubleshooting, and discusses common mistakes users and BI tools make when interacting with Impala. Manish concludes by detailing an ideal setup to show all of this in practice.
Speaker: Jean-Daniel Cryans (Cloudera)
HBase Replication has come a long way since its inception in HBase 0.89 almost four years ago. Today, master-master and cyclic replication setups are supported; many bug fixes and new features like log compression, per-family peers configuration, and throttling have been added; and a major refactoring has been done. This presentation will recap the work done during the past four years, present a few use cases that are currently in production, and take a look at the roadmap.
HBase 2.0 is the next stable major release for Apache HBase scheduled for early 2017. It is the biggest and most exciting milestone release from the Apache community after 1.0. HBase-2.0 contains a large number of features that is long time in the development, some of which include rewritten region assignment, perf improvements (RPC, rewritten write pipeline, etc), async clients, C++ client, offheaping memstore and other buffers, Spark integration, shading of dependencies as well as a lot of other fixes and stability improvements. We will go into technical details on some of the most important improvements in the release, as well as what are the implications for the users in terms of API and upgrade paths. Existing users of HBase/Phoenix as well as operators managing HBase clusters will benefit the most where they can learn about the new release and the long list of features. We will also briefly cover earlier 1.x release lines and compatibility and upgrade paths for existing users and conclude by giving an outlook on the next level of initiatives for the project.
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
This document discusses file system usage in HBase. It provides an overview of the three main file types in HBase: write-ahead logs (WALs), data files, and reference files. It describes durability semantics, IO fencing techniques for region server recovery, and how HBase leverages data locality through short circuit reads, checksums, and block placement hints. The document is intended help understand HBase's interactions with HDFS for tuning IO performance.
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
Apache Ambari manages Hadoop at large-scale and it becomes increasingly difficult for cluster admins to keep the machinery running smoothly as data grows and nodes scale from 30 to 3000 agents. To test at scale, Ambari has a Performance Stack that allows a VM to host as many as 50 Ambari Agents. The simulated stack and 50 Agents per VM can stress-test Ambari Server with the same load as a 3000 node cluster. This talk will cover how to tune the performance of Ambari and MySQL, and share performance benchmarks for features like deploy times, bulk operations, installation of bits, Rolling & Express Upgrade. Moreover, the speaker will show how to use Ambari Metrics System and Grafana to plot performance, detect anomalies, and pinpoint tips on how to improve performance for a more responsive experience. Lastly, the talk will discuss roadmap features in Ambari 3.0 for improving performance and scale.
The document discusses best practices for operating and supporting Apache HBase. It outlines tools like the HBase UI and HBCK that can be used to debug issues. The top categories of issues covered are region server stability problems, read/write performance, and inconsistencies. SmartSense is introduced as a tool that can help detect configuration issues proactively.
The document discusses several key topics in Apache HBase:
1. Procedure version 2 introduces a new framework for running operations like create/drop table and region assignment as procedures with distinct phases.
2. Assignment Manager version 2 uses procedures and improves region assignment and load balancing.
3. Backup/restore now supports HDFS, S3, ADLS and WASB. Snapshots can also be used for backup.
4. Compacting memstore allows in-memory flushing and compaction to improve performance through pipelining.
Performance evaluation of cloudera impala (with Comparison to Hive)Yukinori Suda
This document evaluates the performance of Cloudera Impala, an open-source SQL query engine for Apache Hadoop, and compares it to Apache Hive. It describes Impala's architecture and how the benchmark was conducted. The benchmark found Impala to be over 10 times faster than Hive for the modified TPC-H query, with the fastest Impala version taking 14.337 seconds compared to 164.161 seconds for Hive. The document concludes that future versions of Impala integrated with CDH5 may provide even better performance by supporting additional file formats.
(DAT309) Scaling Massive Content Stores with Amazon AuroraAmazon Web Services
John Newton, founder and CTO of Alfresco, describes how Amazon Aurora enables the Alfresco Content Management System to store, manage, and retrieve billions of documents and related information with fast and linear scalability. Using new techniques of information modeling, indexing, and processing with the recently launched Aurora database, Alfresco can support cloud-based workloads previously not possible for high-throughput insurance, banking, and case-based applications. This session addresses the challenges of scaling document repositories to this level; architectural approaches for coordinating data; search and storage technologies such as Aurora, Solr, Amazon EBS, and Amazon S3; the breadth of use cases that modern content systems need to support; and how to support user applications that require subsecond response times. The result is a solution that once would have required large data centers to support but can now be handled cost-effectively with AWS and Aurora.
The document summarizes the HBase 1.0 release which introduces major new features and interfaces including a new client API, region replicas for high availability, online configuration changes, and semantic versioning. It describes goals of laying a stable foundation, stabilizing clusters and clients, and making versioning explicit. Compatibility with earlier versions is discussed and the new interfaces like ConnectionFactory, Connection, Table and BufferedMutator are introduced along with examples of using them.
Speakers: Jingcheng Du and Ramkrishna Vasudevan (Intel)
As HBase continues to expand in application and enterprise or government deployments, there is a growing demand for storing data across geographically distributed datacenters for improved availability and disaster recovery. The Cross-Site BigTable extends HBase to make it well-suited for such deployments, providing the capabilities of creating and accessing HBase tables that are partitioned and asynchronously backed-up over a number of distributed datacenters. This talk reveals how the Cross-Site BigTable manages data access over multiple datacenters and removes the data center itself as a single point of failure in geographically distributed HBase deployments.
Impala's low-latency SQL queries for HDFS files motivated improvements to HDFS to better support Impala's needs. These included exposing block replica disk locations, allowing co-located block replicas, in-memory caching of hot files, and reduced data copying during reads. The changes helped Impala achieve significantly faster performance than Hive for queries, especially complex queries, by optimizing I/O and data locality.
My talk at ScaleConf 2017 in Cape Town on some tips and tactics for scaling WordPress, with reference to WordPress.com and the container-based VIP Go platform.
Video of my talk is here: https://www.youtube.com/watch?v=cs0DcY80spw
The document outlines topics covered in "The Impala Cookbook" published by Cloudera. It discusses physical and schema design best practices for Impala, including recommendations for data types, partition design, file formats, and block size. It also covers estimating and managing Impala's memory usage, and how to identify the cause when queries exceed memory limits.
This talk delves into the many ways that a user has to use HBase in a project. Lars will look at many practical examples based on real applications in production, for example, on Facebook and eBay and the right approach for those wanting to find their own implementation. He will also discuss advanced concepts, such as counters, coprocessors and schema design.
SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 - AUSPC2012Michael Noel
Using SQL Server 2012 AlwaysOn Availability Groups for failover of SharePoint 2010 Databases, as presented at the Australian SharePoint Conference - March 2012 in Melbourne.
Solr is an open source enterprise search platform built on Apache Lucene. It provides powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., word, pdf) handling. Solr powers the search capabilities of many large websites and is highly scalable, fault tolerant, and easy to use.
A market leader in technical training has organized a workshop on Oracle technology to help students build careers as engineers. The workshop will cover the physical and logical structure of Oracle databases, administrative tasks, features for managing storage, data, availability, security and performance. Specific topics will include backup methods, Oracle tasks like cold backups, information about Oracle Corporation, Automatic Storage Management, partitioning, XML capabilities and security products. The goal is to help students gain the knowledge and skills needed to work with Oracle databases.
JavaOne 2009 - Full-Text Search: Human Heaven and Database Savior in the CloudAaron Walker
This document discusses full-text search capabilities using Hibernate Search. It provides an overview of how Hibernate Search allows for full-text search across objects by transparently indexing them and integrating with Lucene. It also discusses how Hibernate Search enables querying, projections, indexing replication and sharding. The document then presents a case study of an automotive classifieds website that uses these Hibernate Search features for scalability on Amazon Web Services.
Alfresco is the largest open source content management company. It provides a content management platform, records management, web content services, and enterprise collaboration features. Alfresco can be deployed in scalable and reliable configurations including clustering servers, load balancing, database replication, and separate index servers to improve performance and reduce points of failure. It also offers options for working outside the firewall through a hosted cloud service.
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...Michael Noel
Using SQL Server 2012 AlwaysOn Availability Groups allows for high availability and disaster recovery of SharePoint 2013 farms. It provides zero data loss failover between nodes and readable secondary replicas. The document outlines the requirements and provides a step-by-step guide to implementing AlwaysOn Availability Groups for a SharePoint farm, including creating an availability group, adding databases, and creating an availability group listener.
Video that accompanies this presentation at: http://www.youtube.com/watch?v=1t3Z2pJyulA
Join us for a guided tour of the Alfresco SOLR integration and new search sub-systems. We’ll discuss how it works, the limitations of eventual consistency, guidance for configuration and set-up. We’ll also cover the steps required to migrate, improved PATH performance, in-query ACL evaluation, cross-language support and monitoring as well as performance.
Streaming Solutions for Real time problemsAbhishek Gupta
The document is a presentation on streaming solutions for real-time problems using Apache Kafka, Kafka Streams, and Redis. It begins with an introduction and overview of the technologies. It then presents a sample monitoring application using metrics from multiple machines as a use case. The presentation demonstrates how to implement this application using Kafka as the event store, Kafka Streams for processing, and Redis as the state store. It also shows how to deploy the application components on Oracle Cloud.
CETPA INFOTECH PVT LTD is one of the IT education and training service provider brands of India that is preferably working in 3 most important domains. It includes IT Training services, software and embedded product development and consulting services.
http://www.cetpainfotech.com
Deep Dive: Alfresco Core Repository (... embedded in a micro-services style a...J V
Alfresco Summit 2014 (London)
Though best practice is to leverage Alfresco through the well defined API's, it can be useful to understand the internals of the repository so that your development efforts are the most effective. A deep understanding of the repository will help you to evaluate performance bottlenecks, look for bugs, or make contributions. This session provides an overview of the repository internals, including the major components, the key services, subsystems, and database. We then provide an example where we leverage the repository in a micro-service architecture while building Alfresco's future cloud products and show how the different parts of the repository interact to fulfill requests.
http://summit.alfresco.com/london/sessions/diving-deep-alfresco-repository
https://www.youtube.com/watch?v=TAE9UjC0xxc
Presented by Mark Miller, Software Developer, Cloudera
Apache Lucene/Solr committer Mark Miller talks about how Solr has been integrated into the Hadoop ecosystem to provide full text search at "Big Data" scale. This talk will give an overview of how Cloudera has tackled integrating Solr into the Hadoop ecosystem and highlights some of the design decisions and future plans. Learn how Solr is getting 'cozy' with Hadoop, which contributions are going to what project, and how you can take advantage of these integrations to use Solr efficiently at "Big Data" scale. Learn how you can run Solr directly on HDFS, build indexes with Map/Reduce, load Solr via Flume in 'Near Realtime' and much more.
- Laravel is a popular PHP MVC framework that provides tools like Eloquent ORM, Blade templating, routing, and Artisan CLI to help developers build applications faster.
- Key Laravel features include Eloquent for database access, Blade templating engine, routing system, middleware, and Artisan CLI commands for common tasks like migrations and seeding.
- The document discusses Laravel's file structure, installing via Composer, and provides best practices for coding with Laravel like avoiding large queries and using middleware, validation, and CSRF protection.
Jose portillo dev con presentation 1138Jose Portillo
This document discusses best practices for implementing Solr sharding in Alfresco. It defines what sharding is and explains that it involves splitting a single index into multiple parts or shards to improve search performance, distribute indexing load, and scale horizontally. The document outlines different types of sharding, considerations for the number of shards, high availability, backup procedures, and common configuration settings when using Solr sharding in Alfresco.
A Practitioner's Guide to Successfully Migrate from Oracle to Sybase ASE Part 2Dobler Consulting
This document provides an overview of migrating from Oracle to Sybase ASE. It discusses comparing the key differences between Oracle and Sybase ASE, including processes, case sensitivity, storage architecture, transactions, parallel execution and more. It also covers performing a portability check to identify migration issues and develop workarounds, such as how to handle Oracle triggers, synonyms, sequences, materialized views and different table types in Sybase ASE. The document is intended to help successfully migrate applications with minimal code rewrites.
This document provides an overview and introduction to Cosmos DB. It discusses what Cosmos DB is, its data models, APIs, partitioning, and global distribution. It explains why Cosmos DB was created to address limitations of traditional databases. Key aspects covered include throughput and consistency levels, indexing, backups, failovers, and using Cosmos DB for developers and database administrators. The document also discusses migration tools, limitations, and integrations with PowerBI and geospatial data.
This document discusses integrating Apache Solr with Apache Hadoop for big data search capabilities. It provides background on Mark Miller and the history of search on Hadoop. It outlines how Solr, Lucene, Hadoop, and related projects can be integrated to allow full-text search across large datasets in HDFS. Specific integration points discussed include allowing Solr to read and write directly to HDFS, custom directory support in Solr, replication support, and using Morphlines for extraction, transformation, and loading of data into Solr.
The document outlines the hardware and software requirements, new features, and best practices for designing and implementing an infrastructure for SharePoint 2013, including new service applications, the use of claims-based authentication, and recommendations for farm topology based on organization size from single server to large virtual environments. It also discusses high availability, disaster recovery, security, and optimization strategies.
"Event streaming platforms like Kafka have traditionally leaned on ZooKeeper as the cornerstone for coordination and metadata management. This presentation introduces Oxia, a compelling alternative solution.
Hailing from the labs of StreamNative, Oxia brings forth a genuinely horizontally scalable metadata framework. It empowers distributed messaging systems to seamlessly handle hundreds of millions of topics, all while removing the intricacies and operational burdens associated with ZooKeeper.
The transformative potential of Oxia extends to developers' messaging strategies and application architectures. It holds the promise of simplifying both, marking a significant evolution in the event streaming landscape."
This document summarizes the key differences between Oracle and SQL Server databases from the perspective of an Oracle DBA. It discusses that while Oracle DBAs may not get as much respect as SQL Server DBAs, an DBA needs to be able to work with multiple database platforms. It then highlights some of the major technical differences between instances, database files, redo logs, users, and backup/recovery in the two systems.
The document provides an overview of tuning the Oracle E-Business Suite environment. It discusses tuning the applications tier, concurrent manager, client tier and network, database tier, and applications. Specific tips are provided for each area, such as upgrading technology stacks, minimizing network traffic, using specialized managers, enabling SQL tracing and profiling, and isolating the database and applications tiers on a private network.
This document provides a high-level overview of the Alfresco content management platform in 3 sentences: Alfresco is an open source enterprise content management platform that can manage files and metadata, provides search and security features, and includes a workflow engine and APIs to build custom applications. It discusses Alfresco's architecture, developer setup process, development model using APIs and extensions, and demos the platform's capabilities. The document is intended to introduce developers to building applications and customizing Alfresco.
The document discusses advancing digital business through improving digital flow. Digital flow is defined as connecting people, processes, and information quickly, seamlessly, and effortlessly. It discusses how intelligent activation of information based on context can improve access and availability. Case studies are provided of organizations that improved processes like clinical collaboration, courtroom efficiency, and client onboarding by advancing their digital flow with Alfresco's platform. The document encourages readers to evaluate their technology strategy and where they can improve customer experience and efficiency.
Alfresco Day BeNelux: Customer Success Showcase - Credendo GroupAlfresco Software
Delcredere|Ducroire, Belgium's export credit agency, implemented a digital insurance underwriting solution using Alfresco to address challenges with its previously paper-based processes. The project team prepared extensively and prioritized requirements before developing the solution using an agile approach over multiple sprints. Employees now have personal dashboards, case overviews and details, and configuration capabilities in the new paperless system. Feedback has been positive about the uniformity, transparency, and speed improvements versus previous paper file-based processes.
Alfresco Day BeNelux: Digital Transformation - It's All About FlowAlfresco Software
This document summarizes John Newton's keynote presentation at Alfresco Day 2016 in Amsterdam. The presentation focused on accelerating digital business through design thinking, platform thinking, and open thinking. Newton discussed how these approaches can help organizations transform customer experiences, become digital disruptors, and gain business insights from big data. He argued that design thinking empowers users, platform thinking accelerates delivery and engagement, and open thinking fosters innovation. Newton provided examples of how various organizations have applied these concepts to streamline operations, engage customers, fuel innovation, and support transparency.
Alfresco Day Vienna 2016: Support Tools für die Admin-KonsoleAlfresco Software
The document discusses Alfresco Support Tools, which provide administrative tools for monitoring and troubleshooting Alfresco One. It begins with an overview of how Alfresco One administration can be challenging without external tools. It then describes the tools included in Alfresco Support Tools, such as live graphs of system performance, tracking of active sessions, identifying hot threads, taking thread dumps, and configuring log4j settings. The document acknowledges contributions to the project and provides instructions for downloading and installing the Alfresco Support Tools.
Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...Alfresco Software
This document summarizes John Newton's keynote presentation at Alfresco Day 2016 in Vienna. The presentation focused on how organizations can accelerate their digital transformation through design thinking, platform thinking, and open thinking. Newton discussed how these approaches can help transform customer experiences, become digital disruptors, and turn data into business insights. He also outlined Alfresco's digital platform and open source services which aim to simplify digital journeys, fuel innovation, and support organizations' digital transformations.
Alfresco Day Warsaw 2016: Advancing the Flow of Digital BusinessAlfresco Software
1) The document discusses how digital flow, or the seamless connection of people, processes, and information, can maximize productivity and efficiency for businesses.
2) It provides examples of large companies across various industries that have saved millions of dollars by improving digital flow through better collaboration, information access, and process automation using Alfresco's content services platform.
3) Alfresco argues that their open platform, large partner network, and experience helping top companies advance their digital transformations makes them well-suited to help other organizations looking to improve customer experience and operating efficiency through better digital flow.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
BP-1 Performance and Scalability
1. Alfresco Scalability and Performance
How Alfresco 4.x will solve all you headaches around scalable ECM solutions
2. Agenda
ECM high end scenarios
• ECM platform use cases
• What is ECM scalability?
Alfresco ECM scalability and performance
• What we (should) already know
• Alfresco 3.4 improvements
• Alfresco 4.0 “hardening” features
• Apache SolR
• Clustered filesystems
• Post 4.0 scalability opportunities
Alfresco Platform benchmarks
• Benchmarks rationales and progress
• Alfresco Benchmark Tools
3. Alfresco ECM Solutions
What do you expect from an ECM system?
• ECM semantics grows alongside with the ‘content explosion’
• The Classic Alfresco ECM Trio
• System of Record (Massive Injection & Retrieval Content Platform)
• System of Engagement (Enterprise Collaboration Platform)
• Web Content Publishing (Multichannel XML/HTML delivery)
• The present (and future)
• Social Content Management
• Records Management & Archival
• Business Intelligence (Content & Workflow OLAP)
• Each solution has specific requirements around
1. Scalablity
2. Information retrieval Isolation
4. ECM Scalability defined
What does scalability mean for ECM?
• Performance
• “Acceptable” to “No” search degradation as users/content grow
• “Acceptable” to “No” browsing degradation as users/content grow
• Geo-independence of the system (if global)
• Availability
• Offer service continuity upon
• Disasters
• Functional / corrective maintenance
• No single point of failure
• Load distribution
• Parallelization
• Optimization of CPU usage ($$$) per transaction
Scalability requirements are solution dependent …
So pick your battles!
5. Alfresco is designed to scale
At any level of the infrastructure
• Repository
• Cluster nodes can be added dynamically
• Ehcache supports distributed cache replication
• User interface
• Alfresco Share is stateless and HTTP based
• Share can scale out independently from the repository
• Database
• Alfresco supports Master / Slave DB replication
• Solutions a la Oracle RAC are also supported
• Storage
• Content Addressable Storage (XAM) support – Enterprise Only
• Content Store Selector – Enterprise Only
Check out the Scale your Alfresco Solutions paper!
http://support.alfresco.com/ics/support/DLRedirect.asp?fileID=18158
6. Full-Blown Multi-layer scalable architecture
Load Balancer
Share Share Share Share
App Srv App Srv App Srv App Srv
Load Balancer
Alfresco Alfresco Alfresco Alfresco
EHCache EHCache EHCache EHCache
Index
Index
Index
Index
Database
Database Database
Content
(Master) (Slave)
Store
Failover
Database Clustering
Content Content
Store 1 store 2
Content store selector
7. Levels of ECM Information Isolation
Think about it as “transaction isolation” for Databases
1. SERIALIZABLE
• System of records
2. REPEATABLE_READS
3. READ_COMMITTED
4. READ_UNCOMMITTED
8. Where does your solution stand?
Identify where your solution is in the graph, then trade
off between consistency & performance!
There is NO one size fits all, so pick your battles!
9. 10 tips you MUST know to scale Alfresco
1. Disable quotas when unneeded
system.usages.enabled=false
2. Disable audit
audit.enabled=false
3. When ~1M docs and above fine tune #index segments
http://wiki.alfresco.com/index.php?title=Index_Merging_Performance
4. Tune DB pool size (default for evaluation mode)
db.pool.max=225
5. And don’t forget to tune DB accordingly
Ask your DBA to allow enough incoming connections (especially in high concurrency)
6. Use multi-operation batches for your transactions
Transaction setup and teardown are expensive!
7. For bulk injection you can disable in transaction indexing
index.tracking.disableInTransactionIndexing=true
8. Tune permission checking behavior
system.acl.maxPermissionChecks and system.acl.maxPermissionCheckTimeMillis
9. Read the “Scale your Alfresco Solutions” paper!
10. Call me (or any other Alfresco Consultant)
10. 5 scalability gotchas prior to Alfresco 3.4
1. In process (or in transaction) content indexing
Alfresco spending transaction time to update Lucene index
2. Lucene index replicated & tracked per cluster node
Additional DB and Alfresco load for IndexTransactionTracker
3. Query ‘bottlenecks’ during index maintenance
High (blocking) CPU spike during index merging
4. Not all interfaces available in High Availability mode
Additional DB and Alfresco load for IndexTransactionTracker
5. Time Limited permission checking
Non deterministic search results on large user / content bases
11. Alfresco 3.4 scalability improvements
• Hibernate removal
• Improved / optimized DB querying with Ibatis
• Faster commit time
• Permission checking improvements
• Ongoing work in all 3.x versions
• Content Replication
• Geographic master/slave distribution of content
• Can be used also for archival, WCM deployment, etc.
• Site performance project (Enterprise 3.4.6)
• High Share concurrency scenarios
• Tested and usable up to 60.000 Share sites!
12. Alfresco 4.0 radical answers to scalability
• Introduction of logical separate indexing tier
• Apache Solr Integration
• NOTE: Eventual vs transactional index consistency
• Clustering for File System interfaces (e.g. CIFS)
• ContentDiskDriver2
• Scenario specific session linked state distributed using
Hazelcast
• Deterministic permission checking
• Refactored DB canned queries allow in query checking
• Solr filters allow in search permission checking
13. The Apache SOLR subsystem
Rationale
• Removes Lucene load from Alfresco Repository
• Externalize and centralize a logical indexing tier
• Avoid per cluster node index tracking
Architectural features
• Pull (vs. Push) indexing
• Solr polls Alfresco periodically for index updates
• Default 15s can be configured
• Can be scaled out
• Multi Solr architectural options
• NOTE: sharding/clustering not available yet
• Dedicated or shared Enterprise search engine
16. Solr Implications
Do I have to migrate?
• No, you can still use Lucene indexing subsystem
• Solr can be configured to run and index in parallel
Key Features
• No in-txn indexing
• One core per Alfresco store (e.g. WorkspaceStore, ArchiveStore)
• Cores can be configured separately
• In query permission checking with Solr filters
• Deterministic
Alfresco impacts
• Wherever “transactional consistency” was needed index
queries have been substituted with DB “canned queries”
• Authentication, Doclib, Bootstrap, Check-in/out amongst others
• Implementers should be aware of eventual index consistency
17. Transactional vs. Eventual index update
Transactional (prior to 4.0) Eventual (4.0)
• Indexes updated within • Index server periodically
the database txn polls repository (default =15s)
Pros Pros
• Indexes consistent with DB • Faster commit time (50%)
at any time • Separately scalable tier
• Applications can work • Configurable index delay
independently with DB or
indexes • Independent from
Cons #(concurrent users)
• Resource intensive
• Slower commit time Cons
• Index locking and • Dirty or non repeatable
contemption with index reads are possible
concurrent user growth • Cannot be used where
transactional consistency
is needed
• E.g. RM, AVM, custom apps
18. Clustered file systems
What’s new?
• ContentDiskDriver2 Brand new implementation
• JLAN-Alfresco interface binds state to sessions
• No clustering required
• JLAN Clustering
• Hazelcast provides distributed locking
http://www.hazelcast.com/
Hazelcast configuration
filesystem.cluster.enabled=[true,false]
Enables or disables the filesystem cluster.
filesystem.cluster.configFile
Location of Hazelcast configuration file
http://wiki.alfresco.com/wiki/Configuring_Hazelcast_for_JLAN_clustering
19. Post 4.0 frontiers
SolR
• Index Shards
• Clustering (Read replication)
• Solr on EC2
• Faceted Search
• Term highlighting
Product evolution
• Cloud offering
• Benchmarking process
20. Recap - Remember the gotchas?
1. In process (or in transaction) content indexing
Asynchronous indexing with Solr
2. Lucene index replicated & tracked per cluster node
Centralized Solr indexing tier
3. Query ‘bottlenecks’ during index maintenance
Index load / maintenance moved to a separate tier
4. Not all interfaces available in High Availability mode
ContentDiskDriver2 & Hazelcast enable CIFS clustering
5. Time Limited permission checking
Solr filters query time deterministic permission checking
23. Alfresco Platform Benchmarks (EE 4.0)
What are we going to test?
• Scenario driven
• Bulk loading / massive injection
• Enterprise Collaboration Platform
• Dimensions
• (#content nodes, #users, #cluster nodes)
• Overtime architectural options / versions
• Initial data points identified starting from field experience
What are we going to measure?
• Throughput
• Min/Avg/Max Response time
• Cluster scalability
What can you expect?
• Updated Scalability paper with quantitative information
• Initial comparative data with 3.4
24. Alfresco Benchmarks – Why?
For Community and Enterprise network
• Provide quantitative evaluation of Alfresco scalability
• Offer tooling to self benchmark Alfresco in your context
For Engineering Research
• Determine impact of new ideas
• Profile performance issue
For Sizing Guidelines
• How many CPUs do you require?
• How many documents can you store
For QA
• Performance Regression Tests
• Not a one-off exercise
• Become part of engineering process
25. Alfresco Benchmarks Tools
Repository benchmark suite
• JMeter scripts executable from ANT
• CMIS (Mixed and Sequential)
• WebDav (Mixed and Sequential)
• Available at HEADcoderootprojectsrepository-bm
Alfresco Bulk Import Tool
• Hosted on Google Code
• Now Multi-threaded!
• Offers a “content streaming free” mode
• Great performances (especially with no in txn indexing)
Collaboration Platform benchmark suite
• JMeter scripts executable from ANT
• Testing Alfresco Share functionalities with configurable concurrent
users
• Not publicly available at the time of this writing