This document discusses high availability techniques for MySQL databases. It defines high availability as having a high "availability" or uptime, measured by metrics like MTBF and MTTR. The document outlines several techniques to improve high availability, including increasing MTBF through monitoring, testing, and redundancy, and decreasing MTTR through failover capabilities and redundancy. It then discusses several high availability technologies for MySQL, including replication, SANs, DRBD, MySQL Cluster, Percona XtraDB Cluster, load balancing, and more. It concludes by saying recommendations are too complex for the slides and to examine suitable solutions based on needs.
Redis as a Main Database, Scaling and HADave Nielsen
Iskren Chernev, an Independent developer, uses a lot of Redis. In this talk, Iskren will look at a particular Redis use-case -- using it as the main database (not cache). Iskren will show how to achieve reasonable guarantees about data integrity, speed, high-availability in an event of failure and infinite horizontal scalability. This particular approach has proven successful in managing clusters of up to 2400 nodes, and storing data north of 7TB before replication. We'll cover ways to separate your data appropriately into many nodes, performing different types of migrations (from another database, from one cluster to another, scaling migrations and migrating out of Redis), moving nodes without downtime, some configuration tips and monitoring.
Group Replication was implemented at Uber to provide high availability across regions without write downtime during database promotions or failures. However, initial testing was unsuccessful as errors occurred on every update. The system was migrated back to the previous setup. Later implementations in production have been more successful, with Group Replication providing increased availability for over 6 months without major incidents. While it provides benefits, certain transaction patterns like concurrent DML can cause failures, and DDL requires write outages or temporary single-primary mode. Overall it is seen as a useful addition where no downtime is allowed and transactions can be retried.
In this presentation we go through the parallel programming posibilities in .NET C#.
First we discuss some core concepts then we explore the problem space and the related toolbox.
Finally we will have a sample library which implements the same interface in 20+ different ways to show you the alternatives.
Related code can be reached here: https://github.com/peter-csala/parallel-programming-dotnet
Martin Casado discusses the relationship between data center networking overlays and SDN. Overlays are commonly used today to provide functionality like isolation, security, and load balancing over simple network fabrics. SDN can help by controlling the overlay and standardizing interfaces. SDN is also well-suited to configure the underlying fabric, though may not be the best fit for dynamic fabric control planes at large scale. Extensions could make SDN friendlier for both overlays and fabric control.
10 minutes lightning talks about how to avoid hotspots in Elasticsearch. It goes through the way elasticsearch decides which node will host your data as well as how to force it to store the data on the nodes you want.
This document discusses data management in cloud platforms. It begins with an introduction to cloud computing, noting its benefits like reduced costs and ability to scale resources. It then covers cloud characteristics like elasticity and security risks. The document discusses data analysis challenges in clouds like fault tolerance and heterogeneous environments. It focuses on data replication techniques between master and slave nodes and different replication types. It also covers master-slave election processes to select a primary node and considers factors like priority, network partitions, and quick detection of a failed primary.
This document discusses monitoring Cassandra, including an overview of Cassandra, its internal concepts like read/write paths and compactions, and important metrics to monitor. Key metrics to monitor Cassandra's performance include read/write latency, live SSTable count, thread pool pending/completed tasks, and memtable flush count. Operations like compactions and hinted handoff replication should also be monitored. Resource usage metrics like JVM garbage collection time and memory usage are important to monitor as well. Monitoring these metrics helps detect anomalies, optimize performance, and ensure Cassandra's successful operation over the long run.
This document discusses high availability techniques for MySQL databases. It defines high availability as having a high "availability" or uptime, measured by metrics like MTBF and MTTR. The document outlines several techniques to improve high availability, including increasing MTBF through monitoring, testing, and redundancy, and decreasing MTTR through failover capabilities and redundancy. It then discusses several high availability technologies for MySQL, including replication, SANs, DRBD, MySQL Cluster, Percona XtraDB Cluster, load balancing, and more. It concludes by saying recommendations are too complex for the slides and to examine suitable solutions based on needs.
Redis as a Main Database, Scaling and HADave Nielsen
Iskren Chernev, an Independent developer, uses a lot of Redis. In this talk, Iskren will look at a particular Redis use-case -- using it as the main database (not cache). Iskren will show how to achieve reasonable guarantees about data integrity, speed, high-availability in an event of failure and infinite horizontal scalability. This particular approach has proven successful in managing clusters of up to 2400 nodes, and storing data north of 7TB before replication. We'll cover ways to separate your data appropriately into many nodes, performing different types of migrations (from another database, from one cluster to another, scaling migrations and migrating out of Redis), moving nodes without downtime, some configuration tips and monitoring.
Group Replication was implemented at Uber to provide high availability across regions without write downtime during database promotions or failures. However, initial testing was unsuccessful as errors occurred on every update. The system was migrated back to the previous setup. Later implementations in production have been more successful, with Group Replication providing increased availability for over 6 months without major incidents. While it provides benefits, certain transaction patterns like concurrent DML can cause failures, and DDL requires write outages or temporary single-primary mode. Overall it is seen as a useful addition where no downtime is allowed and transactions can be retried.
In this presentation we go through the parallel programming posibilities in .NET C#.
First we discuss some core concepts then we explore the problem space and the related toolbox.
Finally we will have a sample library which implements the same interface in 20+ different ways to show you the alternatives.
Related code can be reached here: https://github.com/peter-csala/parallel-programming-dotnet
Martin Casado discusses the relationship between data center networking overlays and SDN. Overlays are commonly used today to provide functionality like isolation, security, and load balancing over simple network fabrics. SDN can help by controlling the overlay and standardizing interfaces. SDN is also well-suited to configure the underlying fabric, though may not be the best fit for dynamic fabric control planes at large scale. Extensions could make SDN friendlier for both overlays and fabric control.
10 minutes lightning talks about how to avoid hotspots in Elasticsearch. It goes through the way elasticsearch decides which node will host your data as well as how to force it to store the data on the nodes you want.
This document discusses data management in cloud platforms. It begins with an introduction to cloud computing, noting its benefits like reduced costs and ability to scale resources. It then covers cloud characteristics like elasticity and security risks. The document discusses data analysis challenges in clouds like fault tolerance and heterogeneous environments. It focuses on data replication techniques between master and slave nodes and different replication types. It also covers master-slave election processes to select a primary node and considers factors like priority, network partitions, and quick detection of a failed primary.
This document discusses monitoring Cassandra, including an overview of Cassandra, its internal concepts like read/write paths and compactions, and important metrics to monitor. Key metrics to monitor Cassandra's performance include read/write latency, live SSTable count, thread pool pending/completed tasks, and memtable flush count. Operations like compactions and hinted handoff replication should also be monitored. Resource usage metrics like JVM garbage collection time and memory usage are important to monitor as well. Monitoring these metrics helps detect anomalies, optimize performance, and ensure Cassandra's successful operation over the long run.
This document discusses using artificial intelligence to optimize queries in BigQuery databases. It describes the benefits and limitations of managed databases like BigQuery. It then presents alternatives like SQL Server, ElasticSearch and Athena. The document outlines best practices for partitioning, clustering and limiting queries in BigQuery. It demonstrates how an AI optimization engine could predict query costs and perform real-time optimizations to scan less data and provide query recommendations. The goal is to make BigQuery faster, smarter and more efficient.
A short presentation describing the roots and features of the MattockFS forensic file system, implementing page-cache friendly, capability secure, write-once data archive, opportunistic hashing and local message bus implementation geared primarily at digital forensic framework usage.
Apache Cassandra is a free, distributed, open source, and highly scalable NoSQL database that is designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability, and tunable consistency. Cassandra's architecture allows it to spread data across a cluster of servers and replicate across multiple data centers for fault tolerance. It is used by many large companies for applications that require high performance, scalability, and availability.
In Data Engineer's Lunch #54, we will discuss the data build tool, a tool for managing data transformations with config files rather than code. We will be connecting it to Apache Spark and using it to perform transformations.
Accompanying YouTube: https://youtu.be/dwZlYG6RCSY
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Cassandra Lunch #87: Recreating Cassandra.api using Astra and StargateAnant Corporation
In Cassandra Lunch #87, we will work on using AstraDBs included Stargate API layer to substitute for the written Node and Python APIs in our Cassandra.api project.
Accompanying YouTube: Coming Soon!
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
The document discusses Mantis, a reactive stream processing system for providing scalable insights in complex operational environments. It describes key requirements like cost sensitivity, high throughput, and resilience. Mantis meets these challenges through elastic clusters and jobs, non-blocking processing, monitoring and replacing failed servers, and backpressure strategies to handle different source types. The goal is to help operators manage comprehension of their environments as complexity increases.
Erich Ess CTO of SimpelRelevance introduces the Spark distributed computing platform and explains how to integrate it with Cassandra. He demonstrates running a distributed analytic computation on a data-set stored in Cassandra
This document discusses trends in computing and data architectures. It notes that hardware is becoming more virtualized through containers and functions, bringing code closer to storage. Meanwhile, data structures are becoming more distributed through databases, distributed databases, and distributed ledgers. Computing scopes are decreasing while storage scopes are increasing. This is because data is inert but code can be more tightly controlled at smaller scopes. The document also discusses challenges of integrating microservices with data, including consistency, and notes the importance of people and processes ("metadata") alongside technology.
Strata Beijing 2017: Jumpy, a python interface for nd4jAdam Gibson
GPUs should complement, not replace, the Hadoop ecosystem for big data workloads. Replacing the entire big data stack would be too costly. The presenter believes GPUs are best suited for accelerated computation and a few other use cases to gain an initial foothold in the market. Existing Python interfaces to machine learning frameworks rely too heavily on network communication and serialization, introducing significant overhead. Nd4j and Jumpy provide alternatives that use direct C++ interfaces and pointers for lower latency between Python and deep learning operations on CPU and GPU.
A Comprehensive Introduction to Apache Cassandra.
Agenda:
- What is NoSQL?
- What is Cassandra?
- Architecture
- Data Model
- Key Features and Benefits
- Cassandra Tools
-- CQL
-- Nodetool
-- DataStax Opscenter
- Who’s using Cassandra?
Cassandra is a distributed, decentralized, wide column store NoSQL database modeled after Amazon's Dynamo and Google's Bigtable. It provides high availability with no single point of failure, elastic scalability and tunable consistency. Cassandra uses consistent hashing to partition and distribute data across nodes, vector clocks to track data versions for consistency, and Merkle trees to detect and repair inconsistencies between replicas.
Concepts, architectures and uses of distributed databases. A gentle introduction to get you up to speed and understand the value and potential of distributed databases.
The document discusses best practices for scalability. It covers definitions of concurrency, throughput, and extensibility. It then discusses techniques for handling scale like throttling, caching, stateful vs stateless design, asynchronous vs synchronous processing, and service-oriented architecture. Specific techniques are discussed for scaling applications at the client, server, database, and for big data.
An overview of how recent changes in technology have changed priorities for databases to distributed systems, and how you can preserve consistency in distributed data stores like Riak.
OpenStackTage Cologne - OpenStack at 99.999% availability with CephDanny Al-Gaaf
High availability is a very important and frequently discussed topic for clouds at the infrastructure level. There are several concepts to provide a HA-ready OpenStack and also software defined storage like Ceph is highly available with no single point of failure.
But what about HA if you bring OpenStack and Ceph together? What are the dependencies between them and how do they influence the availability of your cloud instances from the tenant or application point of view?
How does the design of your classic high-available data center, e.g. with two fire compartments, power backup, and redundant power and network lines impact your cluster setup? There are many different scenarios of potential failures. What does this mean regarding building and managing failure zones, especially in case of technologies like Ceph which need to be able to build a quorum to keep up running?
This document provides a summary of a presentation on practical MySQL tuning. It discusses measuring critical system resources like CPU, memory, I/O and network usage to identify bottlenecks. It also covers rough tuning of MySQL parameters like the InnoDB buffer pool size, log file size and key buffer size. Further tuning includes application optimizations like query tuning with EXPLAIN, index tuning, and schema design. The presentation also discusses scaling MySQL through approaches like caching, sharding, replication and optimizing architecture and data distribution. Regular performance monitoring is emphasized to simulate increased load and aid capacity planning.
99.999% Available OpenStack Cloud - A Builder's GuideDanny Al-Gaaf
This document discusses achieving 99.999% availability for OpenStack cloud services running on Ceph storage. It describes Deutsche Telekom's motivation to build highly available NFV clouds across multiple data centers. Various failure scenarios are considered, such as power, network, hardware failures, and disasters. Setting up OpenStack and Ceph for high availability requires redundant components and careful planning. Ensuring quorum across Ceph monitors and OpenStack services is critical. Achieving five nines availability requires distributing applications across multiple regions to tolerate data center or regional failures.
John Hugg presented on building an operational database for high-performance applications. Some key points:
- He set out to reinvent OLTP databases to be 10x faster by leveraging multicore CPUs and partitioning data across cores.
- The database, called VoltDB, uses Java for transaction management and networking while storing data in C++ for better performance.
- It partitions data and transactions across server cores for parallelism. Global transactions can access all partitions transactionally.
- VoltDB is well-suited for fast data applications like IoT, gaming, ad tech which require high write throughput, low latency, and global understanding of live data.
Benchmarks, performance, scalability, and capacity what s behind the numbers...james tong
Baron Schwartz gave a presentation on analyzing database performance beyond surface-level metrics and benchmarks. He discussed how ideal benchmarks provide full system specifications and metrics over time to understand response times and throughput. Little's Law and queueing theory can predict concurrency, response times, and capacity given arrival rates and service times. While tools like Erlang C model queues, the assumptions must be validated. True scalability is nonlinear due to bottlenecks, and debunking performance claims requires examining raw data.
Benchmarks, performance, scalability, and capacity what's behind the numbersJustin Dorfman
Baron Schwartz gave a presentation on analyzing database performance beyond surface-level metrics and benchmarks. He discussed how ideal benchmarks provide full system specifications and metrics over time to understand response times and throughput. Little's Law and queueing theory can predict concurrency, response times, and capacity given arrival rates and service times. While tools like Erlang C model queues, the assumptions must be validated. True scalability is nonlinear due to bottlenecks, and debunking performance claims requires examining raw data.
Modern computationally intensive tasks are rarely bottlenecked by the absolute performance of your processor cores. The real bottleneck in 2013 is getting data out of memory. CPU caches are designed to alleviate the difference in performance between CPU core clock speed and main memory clock speed, but developers rarely understand how this interaction works or how to measure or tune their application accordingly. This session aims to address this by:
• Describing how CPU caches work in the latest Intel hardware
• Showing what and how to measure in order to understand the caching behavior of software
• Giving examples of how this affects Java program performance and what can be done to address poor cache utilization
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
Until recently, developers have had to deal with some serious tradeoffs when picking a database technology. One could pick a SQL database and deal with their eventual scaling problems or pick a NoSQL database and have to work around their lack of transactions, strong consistency, and/or secondary indexes. However, a new class of distributed database engines is emerging that combines the transactional consistency guarantees of traditional relational databases with the horizontal scalability and high availability of popular NoSQL databases.
In this talk, we'll examine the history of databases to see how we got here, covering the motivations for this new class of systems and why developers should care about them. We'll then take a deep dive into the key design choices behind one open source distributed SQL database, CockroachDB, that enable it to offer such properties and compare them to past SQL and NoSQL designs. We will look specifically at how to achieve the easy deployment and management of a scalable, self-healing, strongly-consistent database with techniques such as dynamic sharding and rebalancing, consensus protocols, lock-free transactions, and more.
This document discusses using artificial intelligence to optimize queries in BigQuery databases. It describes the benefits and limitations of managed databases like BigQuery. It then presents alternatives like SQL Server, ElasticSearch and Athena. The document outlines best practices for partitioning, clustering and limiting queries in BigQuery. It demonstrates how an AI optimization engine could predict query costs and perform real-time optimizations to scan less data and provide query recommendations. The goal is to make BigQuery faster, smarter and more efficient.
A short presentation describing the roots and features of the MattockFS forensic file system, implementing page-cache friendly, capability secure, write-once data archive, opportunistic hashing and local message bus implementation geared primarily at digital forensic framework usage.
Apache Cassandra is a free, distributed, open source, and highly scalable NoSQL database that is designed to handle large amounts of data across many commodity servers. It provides high availability with no single point of failure, linear scalability, and tunable consistency. Cassandra's architecture allows it to spread data across a cluster of servers and replicate across multiple data centers for fault tolerance. It is used by many large companies for applications that require high performance, scalability, and availability.
In Data Engineer's Lunch #54, we will discuss the data build tool, a tool for managing data transformations with config files rather than code. We will be connecting it to Apache Spark and using it to perform transformations.
Accompanying YouTube: https://youtu.be/dwZlYG6RCSY
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday:
https://www.meetup.com/Data-Wranglers-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
Cassandra Lunch #87: Recreating Cassandra.api using Astra and StargateAnant Corporation
In Cassandra Lunch #87, we will work on using AstraDBs included Stargate API layer to substitute for the written Node and Python APIs in our Cassandra.api project.
Accompanying YouTube: Coming Soon!
Sign Up For Our Newsletter: http://eepurl.com/grdMkn
Join Cassandra Lunch Weekly at 12 PM EST Every Wednesday: https://www.meetup.com/Cassandra-DataStax-DC/events/
Cassandra.Link:
https://cassandra.link/
Follow Us and Reach Us At:
Anant:
https://www.anant.us/
Awesome Cassandra:
https://github.com/Anant/awesome-cassandra
Cassandra.Lunch:
https://github.com/Anant/Cassandra.Lunch
Email:
solutions@anant.us
LinkedIn:
https://www.linkedin.com/company/anant/
Twitter:
https://twitter.com/anantcorp
Eventbrite:
https://www.eventbrite.com/o/anant-1072927283
Facebook:
https://www.facebook.com/AnantCorp/
Join The Anant Team:
https://www.careers.anant.us
The document discusses Mantis, a reactive stream processing system for providing scalable insights in complex operational environments. It describes key requirements like cost sensitivity, high throughput, and resilience. Mantis meets these challenges through elastic clusters and jobs, non-blocking processing, monitoring and replacing failed servers, and backpressure strategies to handle different source types. The goal is to help operators manage comprehension of their environments as complexity increases.
Erich Ess CTO of SimpelRelevance introduces the Spark distributed computing platform and explains how to integrate it with Cassandra. He demonstrates running a distributed analytic computation on a data-set stored in Cassandra
This document discusses trends in computing and data architectures. It notes that hardware is becoming more virtualized through containers and functions, bringing code closer to storage. Meanwhile, data structures are becoming more distributed through databases, distributed databases, and distributed ledgers. Computing scopes are decreasing while storage scopes are increasing. This is because data is inert but code can be more tightly controlled at smaller scopes. The document also discusses challenges of integrating microservices with data, including consistency, and notes the importance of people and processes ("metadata") alongside technology.
Strata Beijing 2017: Jumpy, a python interface for nd4jAdam Gibson
GPUs should complement, not replace, the Hadoop ecosystem for big data workloads. Replacing the entire big data stack would be too costly. The presenter believes GPUs are best suited for accelerated computation and a few other use cases to gain an initial foothold in the market. Existing Python interfaces to machine learning frameworks rely too heavily on network communication and serialization, introducing significant overhead. Nd4j and Jumpy provide alternatives that use direct C++ interfaces and pointers for lower latency between Python and deep learning operations on CPU and GPU.
A Comprehensive Introduction to Apache Cassandra.
Agenda:
- What is NoSQL?
- What is Cassandra?
- Architecture
- Data Model
- Key Features and Benefits
- Cassandra Tools
-- CQL
-- Nodetool
-- DataStax Opscenter
- Who’s using Cassandra?
Cassandra is a distributed, decentralized, wide column store NoSQL database modeled after Amazon's Dynamo and Google's Bigtable. It provides high availability with no single point of failure, elastic scalability and tunable consistency. Cassandra uses consistent hashing to partition and distribute data across nodes, vector clocks to track data versions for consistency, and Merkle trees to detect and repair inconsistencies between replicas.
Concepts, architectures and uses of distributed databases. A gentle introduction to get you up to speed and understand the value and potential of distributed databases.
The document discusses best practices for scalability. It covers definitions of concurrency, throughput, and extensibility. It then discusses techniques for handling scale like throttling, caching, stateful vs stateless design, asynchronous vs synchronous processing, and service-oriented architecture. Specific techniques are discussed for scaling applications at the client, server, database, and for big data.
An overview of how recent changes in technology have changed priorities for databases to distributed systems, and how you can preserve consistency in distributed data stores like Riak.
OpenStackTage Cologne - OpenStack at 99.999% availability with CephDanny Al-Gaaf
High availability is a very important and frequently discussed topic for clouds at the infrastructure level. There are several concepts to provide a HA-ready OpenStack and also software defined storage like Ceph is highly available with no single point of failure.
But what about HA if you bring OpenStack and Ceph together? What are the dependencies between them and how do they influence the availability of your cloud instances from the tenant or application point of view?
How does the design of your classic high-available data center, e.g. with two fire compartments, power backup, and redundant power and network lines impact your cluster setup? There are many different scenarios of potential failures. What does this mean regarding building and managing failure zones, especially in case of technologies like Ceph which need to be able to build a quorum to keep up running?
This document provides a summary of a presentation on practical MySQL tuning. It discusses measuring critical system resources like CPU, memory, I/O and network usage to identify bottlenecks. It also covers rough tuning of MySQL parameters like the InnoDB buffer pool size, log file size and key buffer size. Further tuning includes application optimizations like query tuning with EXPLAIN, index tuning, and schema design. The presentation also discusses scaling MySQL through approaches like caching, sharding, replication and optimizing architecture and data distribution. Regular performance monitoring is emphasized to simulate increased load and aid capacity planning.
99.999% Available OpenStack Cloud - A Builder's GuideDanny Al-Gaaf
This document discusses achieving 99.999% availability for OpenStack cloud services running on Ceph storage. It describes Deutsche Telekom's motivation to build highly available NFV clouds across multiple data centers. Various failure scenarios are considered, such as power, network, hardware failures, and disasters. Setting up OpenStack and Ceph for high availability requires redundant components and careful planning. Ensuring quorum across Ceph monitors and OpenStack services is critical. Achieving five nines availability requires distributing applications across multiple regions to tolerate data center or regional failures.
John Hugg presented on building an operational database for high-performance applications. Some key points:
- He set out to reinvent OLTP databases to be 10x faster by leveraging multicore CPUs and partitioning data across cores.
- The database, called VoltDB, uses Java for transaction management and networking while storing data in C++ for better performance.
- It partitions data and transactions across server cores for parallelism. Global transactions can access all partitions transactionally.
- VoltDB is well-suited for fast data applications like IoT, gaming, ad tech which require high write throughput, low latency, and global understanding of live data.
Benchmarks, performance, scalability, and capacity what s behind the numbers...james tong
Baron Schwartz gave a presentation on analyzing database performance beyond surface-level metrics and benchmarks. He discussed how ideal benchmarks provide full system specifications and metrics over time to understand response times and throughput. Little's Law and queueing theory can predict concurrency, response times, and capacity given arrival rates and service times. While tools like Erlang C model queues, the assumptions must be validated. True scalability is nonlinear due to bottlenecks, and debunking performance claims requires examining raw data.
Benchmarks, performance, scalability, and capacity what's behind the numbersJustin Dorfman
Baron Schwartz gave a presentation on analyzing database performance beyond surface-level metrics and benchmarks. He discussed how ideal benchmarks provide full system specifications and metrics over time to understand response times and throughput. Little's Law and queueing theory can predict concurrency, response times, and capacity given arrival rates and service times. While tools like Erlang C model queues, the assumptions must be validated. True scalability is nonlinear due to bottlenecks, and debunking performance claims requires examining raw data.
Modern computationally intensive tasks are rarely bottlenecked by the absolute performance of your processor cores. The real bottleneck in 2013 is getting data out of memory. CPU caches are designed to alleviate the difference in performance between CPU core clock speed and main memory clock speed, but developers rarely understand how this interaction works or how to measure or tune their application accordingly. This session aims to address this by:
• Describing how CPU caches work in the latest Intel hardware
• Showing what and how to measure in order to understand the caching behavior of software
• Giving examples of how this affects Java program performance and what can be done to address poor cache utilization
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017Alex Robinson
Until recently, developers have had to deal with some serious tradeoffs when picking a database technology. One could pick a SQL database and deal with their eventual scaling problems or pick a NoSQL database and have to work around their lack of transactions, strong consistency, and/or secondary indexes. However, a new class of distributed database engines is emerging that combines the transactional consistency guarantees of traditional relational databases with the horizontal scalability and high availability of popular NoSQL databases.
In this talk, we'll examine the history of databases to see how we got here, covering the motivations for this new class of systems and why developers should care about them. We'll then take a deep dive into the key design choices behind one open source distributed SQL database, CockroachDB, that enable it to offer such properties and compare them to past SQL and NoSQL designs. We will look specifically at how to achieve the easy deployment and management of a scalable, self-healing, strongly-consistent database with techniques such as dynamic sharding and rebalancing, consensus protocols, lock-free transactions, and more.
Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes.Raghavendra Prabhu
The talk presented at MySQL & Friends devroom at FOSDEM 2016 in Brussels: https://fosdem.org/2016/schedule/event/clusternaut/
Devroom: https://fosdem.org/2016/schedule/track/mysql_and_friends/
Software Design Practices for Large-Scale AutomationHao Xu
Design practices for large-scale, high-performance, distributed system for complex algorithms such as graph, optimization, prediction, and machine learning etc.
Strategies and techniques to optimize Kafka brokers and producers to minimize data loss under huge traffic volume, limited configuration options, less ideal and constant changing environment and balance against cost.
Microservices, Distributed Computing and CAP Theoremkloia
This document discusses microservices, distributed computing, and the CAP theorem. It provides motivation for scaling applications using microservices and distributed systems by discussing increasing internet traffic over time. It introduces concepts of vertical and horizontal scaling. It discusses distributed system fundamentals of consistency, availability, and partition tolerance. It explains the CAP theorem and tradeoffs between consistency, availability and partition tolerance. It discusses data management approaches of ACID and BASE and types of consistency like strong, weak and eventual consistency.
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
This document summarizes three presentations from a Cassandra Meetup:
1. Jason Cacciatore discussed monitoring Cassandra health at scale across hundreds of clusters and thousands of nodes using the reactive stream processing system Mantis.
2. Minh Do explained how Cassandra uses the gossip protocol for tasks like discovering cluster topology and sharing load information. Gossip also has limitations and race conditions that can cause problems.
3. Chris Kalantzis presented Cassandra Tickler, an open source tool he created to help repair operations that get stuck by running lightweight consistency checks on an old Cassandra version or a node with space issues.
Similar to Pass Elk: CAP Theorem since 90s and Beyond (20)
Orchestrating Cassandra with Kubernetes Operator and PaaSTARaghavendra Prabhu
Video URL: https://youtu.be/GjI6MUz7AyE
This is the slide deck of the Percona Live Online 2020 talk given by me in May 2020: https://www.percona.com/resources/videos/orchestrating-cassandra-kubernetes-operator-and-yelp-paasta-percona-live-online
The talk delves into the architecture of our Cassandra Kubernetes Operator and the multi-region multi-AZ clusters it manages, and strategies we have in place for safe rollouts and zero-downtime migration.
This talk was given at Cassandra London meetup: https://www.meetup.com/Cassandra-London/events/267271963/ . The talk is about orchestration of Cassandra with our Kubernetes Operator and Yelp PaaSTA. We also outline some of the opportunities and challenges associated with this architecture.
Youtube link: https://www.youtube.com/watch?v=JqAILFkkibA
This talk is about orchestration of Cassandra on Kubernetes with Cassandra Operator and Yelp's Platform-as-a-Service: PaaSTA. The talk focusses specifically on the internals of cassandra operator and its core reconcile loop for reconciliation of cluster state and on-disk configuration.
This is a talk about safe and high velocity automation on AWS (Amazon Web Services) with AWS Systems Manager, and is applicable for use cases such as reliability engineering and deployment automation.
Orchestrating Cassandra with Kubernetes: Challenges and OpportunitiesRaghavendra Prabhu
This is a talk about orchestration of Cassandra with cassandra operator, kubernetes and Yelp PaaSTA (https://github.com/Yelp/paasta).
The talk was presented at Computer Laboratory, University of Cambridge as part of the Engineering, Science and Technology Event (https://www.careers.cam.ac.uk/recruiting/event2Tech.asp) in November 2019.
This talk is about Taskerman, a distributed cluster task manager built on top of AWS SQS, Zookeeper and Yelp PaaSTA. The talk was given at Imperial College, London as part of its 'Application of Computing in Industry' series: http://www.imperial.ac.uk/computing/industry/aci/yelp/
Talk given on state of NUMA with Java databases such as Cassandra and how it can improved / ameliorated, and compared with traditional storage engines.
Clusternaut: Orchestrating Percona XtraDB Cluster with KubernetesRaghavendra Prabhu
Raghavendra Prabhu presented on orchestrating Percona XtraDB Cluster (PXC) with Kubernetes. Some key points:
- Kubernetes provides horizontal scaling, self-healing, automated rollouts/rollbacks, service discovery, storage orchestration and more.
- In Kubernetes, PXC nodes would be deployed as pods with a replication controller to maintain a set number of pods. Services provide load balancing to the pods.
- Demonstrated deploying a basic PXC cluster on Kubernetes, including creating a network, cluster, service, replicating pods from a template, and exposing ports.
- Challenges include load balancing for state transfers between nodes and ensuring nodes are
Gone are those days when companies used to be strictly colocated in a single office. Distributed workplaces are gradually becoming the norm than an exception. So, it is essential that we talk more about it and discuss it.
So, this talk is essentially about:
a) Productivity and working from home.
b) Scheduling flexibility.
c) Challenges in communication and ways to overcome them.
d) Ways of getting such a job and Open Source.
e) Measuring work and micro-management
f) Feeling of detachment and workarounds for it.
To sum up, I will make this talk a very informative and entertaining one, as a lightning talk ought to be.
Securing databases with systemd for containers and services Raghavendra Prabhu
Data is the most valuable entity associated with a system, particularly when it is a sensitive one. Not only are there threats associated with physical access
to the box, but also ones where logical access suffices - sql injections etc.
Vulnerabilities like shellshock and heartbleed have also shown that an exploit in one component can also be used to access others through buffer overflows, memory overruns etc. and/or impact the immunity of system severely.
This is where "Principle of least privilege" comes into play. Wikipedia defines it as "a particular abstraction layer of a computing environment, every module (such as a process, a user or a program depending on the subject) must be able to access only the information and resources that are necessary for its legitimate purpose".
Dock'em: Distributed Systems Testing with NetEm and Docker Raghavendra Prabhu
This talk is about distributed systems testing of Galera with NetEm and Docker!
Video of the talk: https://www.youtube.com/watch?v=YBuuvhSO38s&list=PLctlsn9Gs8wbx47tuhxuNytdrsDf_LWI2&index=1
Playlist: https://www.youtube.com/playlist?list=PLctlsn9Gs8wbx47tuhxuNytdrsDf_LWI2
Galera with Docker: How Synchronous Replication and Linux Containers mesh tog...Raghavendra Prabhu
How Galera (Synchronous replication plugin for Percona XtraDB Cluster) can be used with Docker (or linux containers in general) to 'mesh' well.
Video of the talk: https://www.youtube.com/watch?v=3A8EF549Q3Y&list=PLctlsn9Gs8wbx47tuhxuNytdrsDf_LWI2&index=2
Playlist: http://www.youtube.com/playlist?list=PLctlsn9Gs8wbx47tuhxuNytdrsDf_LWI2
Jutsu or Dô: Open documentation: continuous process than a body Raghavendra Prabhu
The document discusses various factors to consider for effective documentation of open source projects. It emphasizes that lucid documentation can help with rapid community growth, attracting more contributors, enhancing code quality, and aiding bug fixes. Conversely, poor documentation can repel users, lead to less understood code, slow project growth, and cause spurious bug reports. Some highlighted factors include keeping documentation up-to-date, using version control, integrating feedback, examples to aid learning, and considering different user types like end users, developers and architects.
Corpus collapsum: Partition tolerance of Galera in a noisy high load environmentRaghavendra Prabhu
This is the talk given at Highload++ 2014 in Moscow, Russia. The topic was partition tolerance testing of Galera in a noisy high load environment with NetEm and Docker.
Corpus collapsum: Partition tolerance of Galera put to testRaghavendra Prabhu
This is the talk given at RICON 2014 (ricon.io) on partition tolerance testing of Galera with docker and netem.
Video: https://www.youtube.com/watch?v=xRD6A8TY_Uw
Link to the talk: http://ricon.io/event-details/index.html#corpus-collapsum
Acidic clusters - Review of contemporary ACID-compliant databases with synchr...Raghavendra Prabhu
This talk reviews database clusters of our time which employ synchronous replication while being ACID compliant. ACID compliance implies ability to support transactions across nodes. As part of this talk, PXC (Percona XtraDB Cluster)/Galera, Google F1 based on Spanner/CFS and MySQL Cluster will be considered. Primary objective here is to expound features of
each in order to highlight differentiating factors and commonality between them.
Running virtualized Galera instances for fun and profitRaghavendra Prabhu
The document discusses running virtualized Galera instances for high availability and discusses how Galera and virtualization can work together. It covers how Galera works with synchronous replication, popular virtualization solutions like KVM and containers, deploying Galera in virtualized environments including initialization, operations, storage, and networking considerations, and concludes by taking questions.
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Measures in SQL (SIGMOD 2024, Santiago, Chile)Julian Hyde
SQL has attained widespread adoption, but Business Intelligence tools still use their own higher level languages based upon a multidimensional paradigm. Composable calculations are what is missing from SQL, and we propose a new kind of column, called a measure, that attaches a calculation to a table. Like regular tables, tables with measures are composable and closed when used in queries.
SQL-with-measures has the power, conciseness and reusability of multidimensional languages but retains SQL semantics. Measure invocations can be expanded in place to simple, clear SQL.
To define the evaluation semantics for measures, we introduce context-sensitive expressions (a way to evaluate multidimensional expressions that is consistent with existing SQL semantics), a concept called evaluation context, and several operations for setting and modifying the evaluation context.
A talk at SIGMOD, June 9–15, 2024, Santiago, Chile
Authors: Julian Hyde (Google) and John Fremlin (Google)
https://doi.org/10.1145/3626246.3653374
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
8. ● Latency
○ Being closer to user
● Availability
○ Elastic infrastructure
○ Resilience to external factors
○ Failover
● Performance:
○ Moore’s law
○ Scale-out
○ Scale up
● Security
Why distribute?
9. CAPTheorem
● Started as a principle from Eric
Brewer in 98.
● Later a conjecture.
● Wasn’t even for “distributed”
systems.
● Proven in 2002!
“Harvest, Yield, and Scalable Tolerant
Systems” =>
14. ● Single node - Not a distributed
system
● Multiple nodes - all unavailable
during network partition: not a
distributed system
● No network partitions - Unicorns
will be seen too if this is true
CA system
15. ● “Total or Partial Loss of Network”
○ Slow Network
● Partitions are common
○ The Network is Reliable
■ “Five racks going wonky (40-80 machines seeing 50 percent packet
loss).”
○ Correlated vs Independent failures
● What is not a partition:
○ Failed nodes
○ Degraded nodes: c.f. Availability
Partitions
16. ● Constraints imposed when a single node data store needs to be distributed
over asynchronous network
● “In presence of partitions, a system has to choose between either C
(consistency) or A (Availability)”: Partition Decision
○ Consistency: Single-copy consistency aka Linearizability
○ Availability: Response from a healty node according to specifications.
● “In absence of partitions? - whatever the system is meant to be”
○ A.C.I.D consistency?
○ PACELC
● ACID and CAP
CAP theorem in tldr (kinda..)
20. PACELC: CAP <-> ELC
● What’s missing in CAP: Latency!
● “In case of network partitioning (P) in a distributed computer system, one
has to choose between availability (A) and consistency (C) (as per the CAP
theorem), but else (E), even when the system is running normally in the
absence of partitions, one has to choose between latency (L) and
consistency (C).”
● Compromises a partition-safe data store has to make when there are no
partitions.
21. Systems at Yelp
● Zookeeper: PC + EC (sync) / EL (default)
● Cassandra (and Dynamo-like): AP + EL/EC (tunable)
○ R+W < or > N
● Elasticsearch: AP + EL
● PNUTS/Sherpa: PC + EL
● MySQL: CP + EC (multi in causal) / EL (async)
● What about CP + EC systems?
○ VoltDB
○ HBase / BigTable
○ Megastore
22. Beyond CAP
● CAP is reductionist.
○ PACELC
● Intelligent clients
● Linearizability is not a requirement in most of cases.
○ Availability is more important
○ Highly Available Transactions
○ Probabilistic Bounded Staleness
● Partition Management
23. Intelligent clients
● Conflict resolution
○ LWW is prone to data loss
● Write to master, read from master for a while
○ Stickiness - “Dirty” Cookie
● Multi-master but segmented
● State/Session management by clients
○ Thick API pattern
● Google Docs under partition
25. Managing Partitions
● Co-ordination Free Systems
○ Coordinate only where required
○ AWS Aurora Paper
● Commutative Data Types (CRDT)
○ Commutative data types: counters, sets
○ Logical monotonicity
● Mergeable Persistent Data Structures
● Version Vectors
26. Further Reading
● https://www.the-paper-trail.org/page/cap-faq/ <= Read this for sure!
● https://aphyr.com/posts/313-strong-consistency-models => Good tree of
consistency
● Proof of CAP theorem by Gilbert and Lynch
● https://www.researchgate.net/publication/220476881_CAP_Twelve_years_l
ater_How_the_Rules_have_Changed
● https://queue.acm.org/detail.cfm?id=2582994
● https://codahale.com/you-cant-sacrifice-partition-tolerance/
● Highly Available Transactions: Virtues and Limitations
● Harvest, Yield, and Scalable Tolerant Systems
● http://dataintensive.net/
27. Further Reading
● Clarifications On The CAP Theorem And Data-Related Errors
● Replicated Data Consistency Explained Through Baseball
● Critique of CAP Theorem
● Linearizability: A Correctness Condition for Concurrent Objects
● Brewer's conjecture and the feasibility of consistent, available,
partition-tolerant web services
● https://people.eecs.berkeley.edu/~brewer/PODC2000.pdf
● http://www.bailis.org/blog/linearizability-versus-serializability/
● https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-
cp-or-ap.html
● Impossibility of Distributed Consensus with One Faulty Process