Apache Con NA 2013 - Cassandra Internalsaaronmorton
The document provides an overview of the architecture and internals of Apache Cassandra. It discusses the client-facing API layer including Thrift, CQL, JMX, and CLI. It then covers the Dynamo layer which handles messaging, distributed hash tables, replication strategies, and gossip protocols. Finally, it summarizes the database layer for managing tables, columns, memtables, SSTables, and read/write paths.
T3 is an optimized protocol used to transport data between WebLogic Server and other Java programs. WebLogic Server tracks each Java Virtual Machine (JVM) it connects to and creates a single T3 connection to carry all traffic for a JVM. For example, if a client accesses an enterprise bean and JDBC connection pool on WebLogic Server, a single network connection is established between the WebLogic Server JVM and the client JVM.
This document provides an overview of the Cassandra codebase, summarizing key classes, startup processes, read and write paths, stages and threading, and bootstrap and streaming processes. It outlines the main controllers like StorageService and MessagingService, as well as lower level classes like Table and ColumnFamilyStore. It also discusses testing, using an IDE with Cassandra, and adding new API methods.
The document compares two methods for limiting CPU usage of databases on the same server: instance caging and processor_group_name binding. It provides facts about how each method works, observations on performance differences, and examples of customer cases where each method may be best. Instance caging allows limiting CPU count online but the SGA is interleaved, while binding groups databases to specific CPUs requiring a restart but keeps the SGA local. The best choice depends on factors like database count and whether guaranteed CPU resources are needed for some databases.
The document describes messages related to problems accessing and modifying the cluster registry (OCR) and cluster database configuration. Common causes include the OCR being inaccessible or invalid configuration entries. Actions include verifying the OCR configuration and permissions, and ensuring the clusterware resources are configured correctly.
This document provides instructions for cloning Oracle Applications Release 12 using Rapid Clone techniques. It describes completing pre-clone steps, then using the adcfgclone.pl script to clone the database tier from the source to target environment. Next, it discusses copying over application files and cloning the applications tier. The process involves running adcfgclone.pl for the database tier and applications tier, entering prompts, and monitoring logs to complete the clone.
This document provides an overview and introduction to contributing to and understanding the internals of Apache Cassandra. It discusses how to contribute code through JIRA issues labeled "Low hanging fruit" and participating in code reviews. It also summarizes the startup sequence, main components like StorageService and StorageProxy, and challenges like the increasing size of the codebase.
The RECOORD scripts allow users to carry out NMR structure calculations using CNS with a standardized protocol. The scripts generate input files for CNS and come with their own forcefields. The summary generates a topology file from either a primary sequence or PDB file. It then generates an extended structure and runs simulated annealing to calculate an NMR ensemble, generating multiple models. The scripts analyze violations and allow calculating additional models in a standardized, automated way.
Apache Con NA 2013 - Cassandra Internalsaaronmorton
The document provides an overview of the architecture and internals of Apache Cassandra. It discusses the client-facing API layer including Thrift, CQL, JMX, and CLI. It then covers the Dynamo layer which handles messaging, distributed hash tables, replication strategies, and gossip protocols. Finally, it summarizes the database layer for managing tables, columns, memtables, SSTables, and read/write paths.
T3 is an optimized protocol used to transport data between WebLogic Server and other Java programs. WebLogic Server tracks each Java Virtual Machine (JVM) it connects to and creates a single T3 connection to carry all traffic for a JVM. For example, if a client accesses an enterprise bean and JDBC connection pool on WebLogic Server, a single network connection is established between the WebLogic Server JVM and the client JVM.
This document provides an overview of the Cassandra codebase, summarizing key classes, startup processes, read and write paths, stages and threading, and bootstrap and streaming processes. It outlines the main controllers like StorageService and MessagingService, as well as lower level classes like Table and ColumnFamilyStore. It also discusses testing, using an IDE with Cassandra, and adding new API methods.
The document compares two methods for limiting CPU usage of databases on the same server: instance caging and processor_group_name binding. It provides facts about how each method works, observations on performance differences, and examples of customer cases where each method may be best. Instance caging allows limiting CPU count online but the SGA is interleaved, while binding groups databases to specific CPUs requiring a restart but keeps the SGA local. The best choice depends on factors like database count and whether guaranteed CPU resources are needed for some databases.
The document describes messages related to problems accessing and modifying the cluster registry (OCR) and cluster database configuration. Common causes include the OCR being inaccessible or invalid configuration entries. Actions include verifying the OCR configuration and permissions, and ensuring the clusterware resources are configured correctly.
This document provides instructions for cloning Oracle Applications Release 12 using Rapid Clone techniques. It describes completing pre-clone steps, then using the adcfgclone.pl script to clone the database tier from the source to target environment. Next, it discusses copying over application files and cloning the applications tier. The process involves running adcfgclone.pl for the database tier and applications tier, entering prompts, and monitoring logs to complete the clone.
This document provides an overview and introduction to contributing to and understanding the internals of Apache Cassandra. It discusses how to contribute code through JIRA issues labeled "Low hanging fruit" and participating in code reviews. It also summarizes the startup sequence, main components like StorageService and StorageProxy, and challenges like the increasing size of the codebase.
The RECOORD scripts allow users to carry out NMR structure calculations using CNS with a standardized protocol. The scripts generate input files for CNS and come with their own forcefields. The summary generates a topology file from either a primary sequence or PDB file. It then generates an extended structure and runs simulated annealing to calculate an NMR ensemble, generating multiple models. The scripts analyze violations and allow calculating additional models in a standardized, automated way.
This document contains error codes and messages for cluster command messages. It provides causes and recommended actions for various errors that could occur. Some examples are node not accessible due to network or configuration issues, null directory name passed, failed to get hostname, computer name and hostname mismatch, empty registry key values, misconfigured cluster, and failed file or directory operations due to permission issues. Recommended actions include checking networks, configurations, permissions, and contacting support.
This document provides an overview of replicating a PostgreSQL database. It discusses setting up a primary server for reads and writes and standby servers that are kept in sync with the primary to serve as backups. The primary server writes data to its write-ahead log (WAL) files, which are streamed in real-time to the standby servers via WAL shipping. This allows the standby servers to keep an identical copy of the database. The document also covers configuration of both the primary and standby servers for replication as well as tools for testing the replication setup.
This document provides instructions for diagnosing Oracle Clusterware and Real Application Clusters (RAC) high availability components. It describes how to enable and disable Oracle Clusterware daemons, collect diagnostic information, view alerts, enable resource debugging, check clusterware health, and view log files. It also covers diagnosing RAC components, enabling additional tracing, and resolving pending shutdown issues.
The document discusses 12 enhancements to wait event monitoring and analysis in Oracle 10g, including more descriptive wait event names, new columns in views like v$session and v$sqlarea, and new views such as v$event_histogram and v$session_wait_history that provide additional insight. It focuses on improvements that help DBAs more easily understand what sessions are waiting for and identify potential performance bottlenecks through better organized wait event classification and more granular wait time statistics.
RMAN was used to clone an Oracle 10g RAC database from a source database SOURCEC3 to a target database TARGETC3 using the DUPLICATE DATABASE feature. The procedure involved preparing the source and target databases, identifying necessary archive log backups, restoring the database to a single-instance target, and then converting it to a RAC database by adding redo logs and enabling cluster functionality. Post-clone tasks verified the successful conversion to a RAC database and started required processes.
The document provides examples of using Oracle Clusterware commands to create and manage application resources in a cluster. It describes how to create an application profile using crs_profile, register it with crs_register, start the application with crs_start, and other tasks. Guidelines are provided for writing action scripts and using attributes, placement policies, and required and optional resources.
This document provides instructions for diagnosing Oracle Clusterware and Real Application Clusters (RAC) high availability components. It describes how to enable and disable Oracle Clusterware daemons, collect diagnostic information, view alerts, enable resource debugging, check clusterware health, and view log files. It also covers diagnosing RAC components, enabling additional tracing, and resolving pending shutdown issues.
A coordination service like Zookeeper helps distributed applications coordinate by providing common services like synchronization, configuration sharing, naming, and leader election. Zookeeper uses an ensemble of servers running as a cluster. It stores data in a hierarchical namespace of znodes. Clients can read and write znodes, set watches on znodes to get notified of changes, and rely on Zookeeper to handle session and server failures in a transparent way. Some common usage recipes for Zookeeper include barriers for synchronization, cluster management using ephemeral znodes, queues using sequential znodes, locks for mutual exclusion, and leader election.
The document defines several key terms related to Oracle Real Application Clusters (RAC) and high availability:
- Network Time Protocol (NTP) ensures accurate synchronization of computer clock times in a network.
- A node is a machine where an Oracle RAC instance resides.
- Oracle Clusterware manages cluster database processing including node membership and resource management.
- The Oracle Cluster Registry (OCR) manages configuration information for the RAC cluster.
An introduction to_rac_system_test_planning_methodsAjith Narayanan
This document provides an overview and agenda for testing an Oracle Real Application Clusters (RAC) system. It outlines 10 tests to validate that the RAC system is installed and configured correctly, and to verify basic functionality and the system's ability to achieve high availability and performance objectives. The tests include planned node reboots, unplanned node failures, instance failures, and network failures. Metrics like failover time, recovery time, and downtime are proposed to measure success.
This document provides instructions for setting up Apache Kafka and Spark Streaming to process streaming data from Kafka with Spark. It describes how to install Zookeeper and Kafka, create a Kafka topic, produce and consume messages, and run the KafkaWordCount Spark Streaming example application to perform word count on the streaming data from Kafka. It also explains the different processing semantics supported by Spark Streaming for Kafka integration.
This document discusses Apache Spark and Cassandra. It provides an overview of Cassandra as a shared-nothing, masterless, peer-to-peer database with great scaling. It then discusses how Spark can be used to analyze large amounts of data stored in Cassandra in parallel across a cluster. The Spark Cassandra connector allows Spark to create partitions that align with the token ranges in Cassandra, enabling efficient distributed queries across the cluster.
Over the years, Core Data has gained a pretty bad reputation amongst developers who prefer to use another service like Realm for their local persistence. In this talk I will make an argument for using Core Data and why it's not so bad. I will share some examples of where it's easy to go wrong with Core Data, and how to avoid those pitfalls. I will also quickly go over setting up Core Data in an app and by the end, the audience should have a couple of simple rules that should help them safely integrate Core Data in their apps.
This document provides instructions for replicating data from an Oracle multitenant container database (CDB) to another CDB using Oracle GoldenGate. It outlines prerequisites, tasks to prepare the databases and environment, and steps for initial load and ongoing replication of data changes in near real-time. Key steps include creating GoldenGate users, adding supplemental logging, configuring Extract and Replicat processes, and monitoring replication status. The goal is to familiarize the reader with setting up a basic Oracle to Oracle replication setup using GoldenGate in a multitenant environment.
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
The document discusses using R to analyze and visualize Oracle database metrics and statistics in real time. It provides examples of R code to connect to an Oracle database and retrieve system statistics and wait event data. The code then computes changes from the previous snapshot and graphs metrics over time, including system statistics by interval, wait times and events, and wait class distributions. It also describes splitting the screen into multiple graphs to show various views of the real-time data. The goal is to build interactive dashboards to monitor database performance using R.
This document provides a summary of the SRVCTL commands, which are used to administer Oracle cluster databases. SRVCTL allows users to add, configure, start, stop, relocate and get the status of cluster database instances, services and other resources. The summary describes the main SRVCTL commands and their usage, objects they apply to, and provides examples of common commands and options.
HBaseCon 2013: A Developer’s Guide to CoprocessorsCloudera, Inc.
This document discusses coprocessors in HBase, which allow arbitrary code to run on each region server. It provides examples of using coprocessors for observers that react to events and endpoints that clients can explicitly call. The examples include expanding single-row JSON data into multiple columns, collecting real-time analytics, and optimizing searches through endpoints.
Oracle real application clusters system tests with demoAjith Narayanan
This document provides details on testing Oracle Real Application Clusters functionality through a series of tests. It begins with an introduction and agenda, then describes 10 tests to validate high availability features including planned and unplanned node/instance failures, network failures, and service failover. Expected results and measures of success are outlined for each test. Sample scripts are also provided.
The document summarizes new data access features in .NET 2.0, including enhancements to the DataSet class for improved performance, loading data using DataReaders, asynchronous command execution, SQL cache dependency, and support for multiple active result sets and generic collections like List<T> and Dictionary<T,F>.
Apache Cassandra in Bangalore - Cassandra Internals and Performanceaaronmorton
Cassandra internals and performance was presented. The key points covered include:
1) Cassandra has a layered architecture with APIs, a Dynamo layer, and a database layer. The Dynamo layer implements the Dynamo paper and handles replication and failure handling.
2) The database layer includes the memtable, SSTables, commit log and more. It handles writes, flushes, compactions and reads from storage.
3) A number of performance tests were shown measuring the impact of configuration parameters like memtable flush queue size, commit log sync period, and secondary indexes on write and read latency. Bloom filters, compactions and concurrency were also discussed.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
This document contains error codes and messages for cluster command messages. It provides causes and recommended actions for various errors that could occur. Some examples are node not accessible due to network or configuration issues, null directory name passed, failed to get hostname, computer name and hostname mismatch, empty registry key values, misconfigured cluster, and failed file or directory operations due to permission issues. Recommended actions include checking networks, configurations, permissions, and contacting support.
This document provides an overview of replicating a PostgreSQL database. It discusses setting up a primary server for reads and writes and standby servers that are kept in sync with the primary to serve as backups. The primary server writes data to its write-ahead log (WAL) files, which are streamed in real-time to the standby servers via WAL shipping. This allows the standby servers to keep an identical copy of the database. The document also covers configuration of both the primary and standby servers for replication as well as tools for testing the replication setup.
This document provides instructions for diagnosing Oracle Clusterware and Real Application Clusters (RAC) high availability components. It describes how to enable and disable Oracle Clusterware daemons, collect diagnostic information, view alerts, enable resource debugging, check clusterware health, and view log files. It also covers diagnosing RAC components, enabling additional tracing, and resolving pending shutdown issues.
The document discusses 12 enhancements to wait event monitoring and analysis in Oracle 10g, including more descriptive wait event names, new columns in views like v$session and v$sqlarea, and new views such as v$event_histogram and v$session_wait_history that provide additional insight. It focuses on improvements that help DBAs more easily understand what sessions are waiting for and identify potential performance bottlenecks through better organized wait event classification and more granular wait time statistics.
RMAN was used to clone an Oracle 10g RAC database from a source database SOURCEC3 to a target database TARGETC3 using the DUPLICATE DATABASE feature. The procedure involved preparing the source and target databases, identifying necessary archive log backups, restoring the database to a single-instance target, and then converting it to a RAC database by adding redo logs and enabling cluster functionality. Post-clone tasks verified the successful conversion to a RAC database and started required processes.
The document provides examples of using Oracle Clusterware commands to create and manage application resources in a cluster. It describes how to create an application profile using crs_profile, register it with crs_register, start the application with crs_start, and other tasks. Guidelines are provided for writing action scripts and using attributes, placement policies, and required and optional resources.
This document provides instructions for diagnosing Oracle Clusterware and Real Application Clusters (RAC) high availability components. It describes how to enable and disable Oracle Clusterware daemons, collect diagnostic information, view alerts, enable resource debugging, check clusterware health, and view log files. It also covers diagnosing RAC components, enabling additional tracing, and resolving pending shutdown issues.
A coordination service like Zookeeper helps distributed applications coordinate by providing common services like synchronization, configuration sharing, naming, and leader election. Zookeeper uses an ensemble of servers running as a cluster. It stores data in a hierarchical namespace of znodes. Clients can read and write znodes, set watches on znodes to get notified of changes, and rely on Zookeeper to handle session and server failures in a transparent way. Some common usage recipes for Zookeeper include barriers for synchronization, cluster management using ephemeral znodes, queues using sequential znodes, locks for mutual exclusion, and leader election.
The document defines several key terms related to Oracle Real Application Clusters (RAC) and high availability:
- Network Time Protocol (NTP) ensures accurate synchronization of computer clock times in a network.
- A node is a machine where an Oracle RAC instance resides.
- Oracle Clusterware manages cluster database processing including node membership and resource management.
- The Oracle Cluster Registry (OCR) manages configuration information for the RAC cluster.
An introduction to_rac_system_test_planning_methodsAjith Narayanan
This document provides an overview and agenda for testing an Oracle Real Application Clusters (RAC) system. It outlines 10 tests to validate that the RAC system is installed and configured correctly, and to verify basic functionality and the system's ability to achieve high availability and performance objectives. The tests include planned node reboots, unplanned node failures, instance failures, and network failures. Metrics like failover time, recovery time, and downtime are proposed to measure success.
This document provides instructions for setting up Apache Kafka and Spark Streaming to process streaming data from Kafka with Spark. It describes how to install Zookeeper and Kafka, create a Kafka topic, produce and consume messages, and run the KafkaWordCount Spark Streaming example application to perform word count on the streaming data from Kafka. It also explains the different processing semantics supported by Spark Streaming for Kafka integration.
This document discusses Apache Spark and Cassandra. It provides an overview of Cassandra as a shared-nothing, masterless, peer-to-peer database with great scaling. It then discusses how Spark can be used to analyze large amounts of data stored in Cassandra in parallel across a cluster. The Spark Cassandra connector allows Spark to create partitions that align with the token ranges in Cassandra, enabling efficient distributed queries across the cluster.
Over the years, Core Data has gained a pretty bad reputation amongst developers who prefer to use another service like Realm for their local persistence. In this talk I will make an argument for using Core Data and why it's not so bad. I will share some examples of where it's easy to go wrong with Core Data, and how to avoid those pitfalls. I will also quickly go over setting up Core Data in an app and by the end, the audience should have a couple of simple rules that should help them safely integrate Core Data in their apps.
This document provides instructions for replicating data from an Oracle multitenant container database (CDB) to another CDB using Oracle GoldenGate. It outlines prerequisites, tasks to prepare the databases and environment, and steps for initial load and ongoing replication of data changes in near real-time. Key steps include creating GoldenGate users, adding supplemental logging, configuring Extract and Replicat processes, and monitoring replication status. The goal is to familiarize the reader with setting up a basic Oracle to Oracle replication setup using GoldenGate in a multitenant environment.
Apache Cassandra, part 3 – machinery, work with CassandraAndrey Lomakin
Aim of this presentation to provide enough information for enterprise architect to choose whether Cassandra will be project data store. Presentation describes each nuance of Cassandra architecture and ways to design data and work with them.
The document discusses using R to analyze and visualize Oracle database metrics and statistics in real time. It provides examples of R code to connect to an Oracle database and retrieve system statistics and wait event data. The code then computes changes from the previous snapshot and graphs metrics over time, including system statistics by interval, wait times and events, and wait class distributions. It also describes splitting the screen into multiple graphs to show various views of the real-time data. The goal is to build interactive dashboards to monitor database performance using R.
This document provides a summary of the SRVCTL commands, which are used to administer Oracle cluster databases. SRVCTL allows users to add, configure, start, stop, relocate and get the status of cluster database instances, services and other resources. The summary describes the main SRVCTL commands and their usage, objects they apply to, and provides examples of common commands and options.
HBaseCon 2013: A Developer’s Guide to CoprocessorsCloudera, Inc.
This document discusses coprocessors in HBase, which allow arbitrary code to run on each region server. It provides examples of using coprocessors for observers that react to events and endpoints that clients can explicitly call. The examples include expanding single-row JSON data into multiple columns, collecting real-time analytics, and optimizing searches through endpoints.
Oracle real application clusters system tests with demoAjith Narayanan
This document provides details on testing Oracle Real Application Clusters functionality through a series of tests. It begins with an introduction and agenda, then describes 10 tests to validate high availability features including planned and unplanned node/instance failures, network failures, and service failover. Expected results and measures of success are outlined for each test. Sample scripts are also provided.
The document summarizes new data access features in .NET 2.0, including enhancements to the DataSet class for improved performance, loading data using DataReaders, asynchronous command execution, SQL cache dependency, and support for multiple active result sets and generic collections like List<T> and Dictionary<T,F>.
Apache Cassandra in Bangalore - Cassandra Internals and Performanceaaronmorton
Cassandra internals and performance was presented. The key points covered include:
1) Cassandra has a layered architecture with APIs, a Dynamo layer, and a database layer. The Dynamo layer implements the Dynamo paper and handles replication and failure handling.
2) The database layer includes the memtable, SSTables, commit log and more. It handles writes, flushes, compactions and reads from storage.
3) A number of performance tests were shown measuring the impact of configuration parameters like memtable flush queue size, commit log sync period, and secondary indexes on write and read latency. Bloom filters, compactions and concurrency were also discussed.
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. We will cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, we'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, we'll discuss how the Log Service can simplify the operational burden of HBase.
This chapter discusses Spark Streaming and provides an overview of its key concepts. It describes the architecture and abstractions in Spark Streaming including transformations on data streams. It also covers input sources, output operations, fault tolerance mechanisms, and performance considerations for Spark Streaming applications. The chapter concludes by noting how knowledge from Spark can be applied to streaming and real-time applications.
This document discusses various techniques for optimizing website performance, including:
1. Network optimizations like compression, HTTP caching, and keeping connections alive.
2. Structuring content efficiently and using tools like YSlow to measure performance.
3. Application caching of pages, database queries, and other frequently accessed content.
4. Database tuning through indexing, query optimization, and offloading text searches.
5. Monitoring resource usage and business metrics to ensure performance meets targets.
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Spark Summit
This document summarizes key aspects of running Spark Streaming applications in production, including fault tolerance, performance, and monitoring. It discusses how Spark Streaming receives data streams in batches and processes them across executors. It describes how driver and executor failures can be handled through checkpointing saved DAG information and write ahead logs that replicate received data blocks. Restarting the driver from checkpoints allows recovering the application state.
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisAnkit Singhal
In a world with a myriad of distributed storage systems to choose from, the majority of Apache HBase clusters still rely on Apache HDFS. Theoretically, any distributed file system could be used by HBase. One major reason HDFS is predominantly used are the specific durability requirements of HBase's write-ahead log (WAL) and HDFS providing that guarantee correctly. However, HBase's use of HDFS for WALs can be replaced with sufficient effort.
This talk will cover the design of a "Log Service" which can be embedded inside of HBase that provides a sufficient level of durability that HBase requires for WALs. Apache Ratis (incubating) is a library-implementation of the RAFT consensus protocol in Java and is used to build this Log Service. It cover the design choices of the Ratis Log Service, comparing and contrasting it to other log-based systems that exist today. Next, It'll cover how the Log Service "fits" into HBase and the necessary changes to HBase which enable this. Finally, it discusses how the Log Service can simplify the operational burden of HBase.
Replication in MongoDB allows for high availability and scaling of reads. A replica set consists of at least three mongod servers, with one primary and one or more secondaries that replicate from the primary. The primary applies all write operations to its oplog, which is then replicated to the secondaries. If the primary fails, a new primary is elected from the remaining secondaries. Administrative commands help monitor and manage the replica set configuration.
The document discusses Coordinated Restore at Checkpoint (CRaC), a feature of the Java Virtual Machine (JVM) that allows saving the state of a running application and restoring it later to avoid JVM startup overhead. CRaC uses the CRIU userspace checkpoint/restore mechanism and provides a simple API for applications to register resources that need to be notified during checkpoint and restore. This allows restoring application state like open files and sockets. An example demonstrates how CRaC can speed up subsequent runs of an application by restoring a pre-filled cache from a previous checkpoint.
The document describes the architecture of Cassandra, including its startup process, layers (API, Dynamo, database), services (StorageProxy, MessagingService, Gossip), and failure handling. It also details how nodes communicate via the Gossip protocol to exchange state information and maintain a consistent cluster view.
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax Academy
Cassandra and OpsCenter has a range of backup and restore topics. I will start with a basic overview of Cassandra backup/restore, walking through the operational steps to provide the understanding required to perform an on disk backup and restore. Expanding on this overview, I'll cover the limitations (including schema requirements) and their impact on the restore process. Further, I'll discuss commit log archiving and point in time restore operations. After covering the underlying operations, I'll wrap up with a discussion of how OpsCenter automates this process and leverages S3.
This document discusses parameters for tuning the performance of WebLogic servers. It covers OS-level TCP parameters, JVM heap size and GC logging parameters, WebLogic server-level parameters like work managers, execute queues, and stuck threads, and JDBC and JMS pool parameters. It also provides an overview of different types of garbage collection in the HotSpot JVM.
Container orchestration from theory to practiceDocker, Inc.
"Join Laura Frank and Stephen Day as they explain and examine technical concepts behind container orchestration systems, like distributed consensus, object models, and node topology. These concepts build the foundation of every modern orchestration system, and each technical explanation will be illustrated using SwarmKit and Kubernetes as a real-world example. Gain a deeper understanding of how orchestration systems work in practice and walk away with more insights into your production applications."
Technical Overview of Apache Drill by Jacques NadeauMapR Technologies
This document provides a technical overview of Apache Drill, including:
1) The basic query processing workflow involving Drillbits, distributed caching, and query execution.
2) The core modules within each Drillbit, including the SQL parser, optimizer, storage engines, and execution components.
3) How queries progress from SQL to logical and physical plans to distributed execution plans.
4) Technologies used include Java, Netty, Zookeeper, Parquet, and others.
"In this session, Twitter engineer Alex Payne will explore how the popular social messaging service builds scalable, distributed systems in the Scala programming language. Since 2008, Twitter has moved the development of its most critical systems to Scala, which blends object-oriented and functional programming with the power, robust tooling, and vast library support of the Java Virtual Machine. Find out how to use the Scala components that Twitter has open sourced, and learn the patterns they employ for developing core infrastructure components in this exciting and increasingly popular language."
Copper: A high performance workflow enginedmoebius
COPPER (COmmon Persistable Process Excecution Runtime) is an open-source high performance workflow engine, that persists the workflow instances (process) state into a database. So there is no limit to the runtime of a process. It can run for weeks, month or years. In addition, this strategy leads to crash safety.
A workflow can describe business processes for example, however any kind of use case is supported. The "modelling" language is Java, that has several advantages:
* with COPPER any Java developer is able to design workflows
* all Java developers like to use Java
* many Java libs can be integrated within COPPER
* many Java tools, like IDEs, can be used
* with COPPER your productivity will be increased when using a workflow engine
* using Java solutions will protect your investment
* COPPER is OpenSource under Apache Licence 2.0
Please visit copper-engine.org for details.
The document discusses various challenges faced and solutions implemented in developing and deploying a mobile-based online marketing project using J2EE technology. Key points:
1. Remote debugging of Tomcat was implemented to allow debugging the integrated project locally. NetBeans was configured for remote debugging.
2. Access logging in Tomcat was improved using the AccessLogValve to log request details for analysis. Extending this valve hides passwords in login requests.
3. Load balancing with Apache HTTP Server as the balancer was set up to improve performance, scalability and high availability across multiple Tomcat nodes with session replication.
Event Processing and Integration with IAS Data ProcessorsInvenire Aude
Quick introduction to IAS Data Processors. Transport modes, transport drivers (SHM, IBM WebSphere MQ, Files, net, http(s)).
Business logic implementation.
Transaction support.
Data processors can be configured to act as:
Data transformation nodes, using PASCAL-like script language,
Gateways and bridges (e.g. HTTP/JSON and Queues/XML),
SQL Database interfaces using the data mapping script extension.
You can configure and use the Data Processors as single threaded programs but you can define many logic implementations and run them in parallel as threads.
You can choose the transaction support from the three available modes: auto-commit, single phase (independent) commits, distributed two phase commit with XA when the supported coordination software is used.
And last but not least, one can find the Data Processors as a very helpful command line admin's tool.
This document summarizes a presentation about near real-time analytics platforms at Uber and LinkedIn. It discusses use cases for streaming analytics, challenges with scalability and operations, and new platforms developed using Apache Samza and SQL. Key points include how Samza is used to build streaming applications with SQL queries, operators, and support for multi-stage workflows. The platforms aim to simplify deployment and management of streaming jobs through interfaces like AthenaX.
Spark Streaming Recipes and "Exactly Once" Semantics RevisedMichael Spector
This document discusses stream processing with Apache Spark. It begins with an overview of Spark Streaming and its advantages over other frameworks like low latency and rich APIs. It then covers core Spark Streaming concepts like windowing and achieving "exactly once" semantics through checkpointing and write ahead logs. The document presents two examples of using Spark Streaming for analytics and aggregation with transactional and snapshotted approaches. It concludes with notes on deployment with Mesos/Marathon and performance tuning Spark Streaming jobs.
This document provides an overview of Postgres clustering solutions and distributed Postgres architectures. It discusses master-slave replication, Postgres-XC/XL, Greenplum, CitusDB, pg_shard, BDR, pg_logical, and challenges around distributed transactions, high availability, and multimaster replication. Key points include the tradeoffs of different approaches and an implementation of multimaster replication built on pg_logical and a timestamp-based distributed transaction manager (tsDTM) that provides partition tolerance and automatic failover.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
10. INITIALIZE STORAGE SERVICE
Load ring state (unless don't)
Start gossip & get initial ring info
Set tokens
Setup auth resources
Ensure gossip stabilized
11. STARTUP
Load config
Run preflight checks
Load schema
Clean up local temporary state
Recover CommitLog
Schedule background compactions
Initialize storage service
15. MESSAGINGSERVICE
Pre-emptively drops messages when overwhelmed
Dropped if time at execution > send time + timeout
Timeout value dependant on message type
Most client-initated requests can be dropped
(see MessagingService.DROPPABLE_VERBS)
16. GOSSIP
What it does do:
Disseminates members' state around the cluster
Versioned: generation (per JVM) & version (per value)
Heartbeats: incremented every gossip round
Application state:
Status
Tokens
Release & schema version
DC & Rack
Addresses
Data size
Health
17. GOSSIP
What doesn't it do:
Notify about up or down nodes
Propagate schema
Transmit data files
Distribute mutations
23. Client request arrives at coordinator:
COORDINATION
Transformed into actionable command(s):
IReadCommand
IMutation
Coordinator distributes execution around the cluster
Replicas perform commands and respond to coordinator
Gather responses and determine client response
31. HINTS
Nodes can be down
Writes may timeout
In which case we may hint
Enabled/disabled globally or enabled per-DC
Writing a hint counts towards ConsistencyLevel.ANY
Deliver hints when a node comes back up & periodically
Too many hints in progress for a replica means we bail early
32. Determine point of failure by WriteType
LOGGED BATCHES
org.apache.cassandra.service.StorageProxy
public static void mutateAtomically(Collection<Mutation> mutations,
ConsistencyLevel consistency_level)
CommitLog for batches
Guarantee eventual success of batched statements
Strives to distribute to across racks in local DC
On success, cleanup log entries asynchronously
Failed batches replayed by the nodes holding the logs
WriteType.BATCH_LOG
WriteType.BATCH
35. READ REPAIR DECISION
Apply filter to sorted list of all live replicas
NONE: closest n replicas required by CL
GLOBAL: all live replicas
DC_LOCAL: all local replicas
Add closest n remotes needed to satisfy CL
Default Global Chance: 0.0
Default Local Chance: 0.1
Give us a list of replicas to send read requests
37. LIGHTS, CAMERA, EXECUTION
Fire off each command using read executor
Requests are sent via MessagingService
Closest replica(s) sent full data requests
Others get digest requests
40. FOREGROUND READ REPAIR
All data requests, no digests
Includes replicas contacted initially
Effectively ConsistencyLevel.ALL
Specialized resolver: RowDataResolver
Retry any short reads
May also perform background Read Repair