ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Agreement in a distributed system is complicated but required. Scylla gained lightweight transactions through Paxos but the latter has a cost of 3X roundtrips. Raft can allow consistent transactions without the performance penalty. Beyond LWT, we plan to integrate Raft with most aspects of Scylla making a leap forward in manageability and consistency
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
How can Kubernetes be best used to automate the deployment, scaling, and various operations of a Scylla database?
Enter Kubernetes Operators, the way to combine domain-specific knowledge about Scylla with the automation framework of Kubernetes.
In this presentation, we will quickly explore what Kubernetes is and why it works so well, highlight the pain points of running Scylla with just Kubernetes primitives, and show how we extended Kubernetes so that it can correctly operate a Scylla database.
Finally, we will show the Scylla Operator in action and show how easily you can spin up a Scylla cluster with just one command.
Robert Haas
Why does my query need a plan? Sequential scan vs. index scan. Join strategies. Join reordering. Joins you can't reorder. Join removal. Aggregates and DISTINCT. Using EXPLAIN. Row count and cost estimation. Things the query planner doesn't understand. Other ways the planner can fail. Parameters you can tune. Things that are nearly always slow. Redesigning your schema. Upcoming features and future work.
No matter how resilient your database infrastructure is, backups are still needed to defend against catastrophic failures. Be it the unlikely hardware failure of all data centers, or the more likely and all-too-human user error. Acknowledging the importance of good backup procedures, the Scylla Manager now natively supports backup and restore operations. In this talk, we will learn more about how that works and the guarantees provided, as well as how to set it up to guarantee maximum resiliency to your cluster.
Agreement in a distributed system is complicated but required. Scylla gained lightweight transactions through Paxos but the latter has a cost of 3X roundtrips. Raft can allow consistent transactions without the performance penalty. Beyond LWT, we plan to integrate Raft with most aspects of Scylla making a leap forward in manageability and consistency
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
PostgreSQL is a very popular and feature-rich DBMS. At the same time, PostgreSQL has a set of annoying wicked problems, which haven't been resolved in decades. Miraculously, with just a small patch to PostgreSQL core extending this API, it appears possible to solve wicked PostgreSQL problems in a new engine made within an extension.
Scylla on Kubernetes: Introducing the Scylla OperatorScyllaDB
How can Kubernetes be best used to automate the deployment, scaling, and various operations of a Scylla database?
Enter Kubernetes Operators, the way to combine domain-specific knowledge about Scylla with the automation framework of Kubernetes.
In this presentation, we will quickly explore what Kubernetes is and why it works so well, highlight the pain points of running Scylla with just Kubernetes primitives, and show how we extended Kubernetes so that it can correctly operate a Scylla database.
Finally, we will show the Scylla Operator in action and show how easily you can spin up a Scylla cluster with just one command.
Robert Haas
Why does my query need a plan? Sequential scan vs. index scan. Join strategies. Join reordering. Joins you can't reorder. Join removal. Aggregates and DISTINCT. Using EXPLAIN. Row count and cost estimation. Things the query planner doesn't understand. Other ways the planner can fail. Parameters you can tune. Things that are nearly always slow. Redesigning your schema. Upcoming features and future work.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
Designing how to implement aggregates in a distributed database is a non-trivial task. When dealing with aggregates that will be polling the entire cluster, it is important to consider performance impacts. If done poorly, full table scans can bring production systems to their knees. So how can you implement aggregate functions without hammering real-time availability and performance for other read/write operations? Learn how distributed aggregates were implemented in ScyllaDB to balance performance across large NoSQL distributed database clusters.
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
Apache Kafka is a scalable streaming platform with built-in dynamic client scaling. The elastic scale-in/scale-out feature leverages Kafka’s “rebalance protocol” that was designed in the 0.9 release and improved ever since then. The original design aims for on-prem deployments of stateless clients. However, it does not always align with modern deployment tools like Kubernetes and stateful stream processing clients, like Kafka Streams. Those shortcoming lead to two mayor recent improvement proposals, namely static group membership and incremental rebalancing (which will hopefully be available in version 2.3). This talk provides a deep dive into the details of the rebalance protocol, starting from its original design in version 0.9 up to the latest improvements and future work. We discuss internal technical details, pros and cons of the existing approaches, and explain how you configure your client correctly for your use case. Additionally, we discuss configuration tradeoffs for stateless, stateful, on-prem, and containerized deployments.
Best practices for MySQL High AvailabilityColin Charles
The MariaDB/MySQL world is full of tradeoffs, and choosing a high availability (HA) solution is no exception. This session aims to look at all the alternatives in an unbiased way. Preference is of course only given to open source solutions.
How do you choose between: asynchronous/semi-synchronous/synchronous replication, MHA (MySQL high availability tools), DRBD, Tungsten Replicator, or Galera Cluster? Do you integrate Pacemaker and Heartbeat like Percona Replication Manager? The cloud brings even more fun, especially if you are dealing with a hybrid cloud and must think about geographical redundancy.
What about newer solutions like using Consul for MySQL HA?
When you’ve decided on your solution, how do you provision and monitor these solutions?
This and more will be covered in a walkthrough of MySQL HA options and when to apply them.
MySQL and PostgreSQL are the two most popular open-source relational databases. To help choosing between them, a comparison of their query optimizers has been carried out. The aim of this session is to summarize the outcome of the comparison. Specifically, to point out optimizer-related strengths and weaknesses.
Inside Cassandra – C* is an interesting piece of software for many reasons, but it is especially interesting in its use of elegant data structures and algorithms. This talk will focus on the data structures and algorithms that make C* such a scalable and performant database. We will walk along the write, read and delete paths exploring the low-level details of how each of these operations work. We will also explore some of the background processes that maintain availability and performance. The goal of this talk is to gain a deeper understanding of C* by exploring the low-level details of its implementation.
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
Beyond the immediate schema changes supported in Scylla Open Source 5.0, learn how the Raft consensus infrastructure will enable radical new capabilities. Discover how it will enable more dynamic topology changes, tablets, immediate consistency, better and faster elasticity, and simplification to repair operations.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
3 Things to Learn About:
-How Kudu is able to fill the analytic gap between HDFS and Apache HBase
-The trade-offs between real-time transactional access and fast analytic performance
-How Kudu provides an option to achieve fast scans and random access from a single API
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
In the first part of Galera Cluster best practices series, we will discuss the following topics:
* ongoing monitoring of the cluster and detection of bottlenecks;
* fine-tuning the configuration based on the actual database workload;
* selecting the optimal State Snapshot Transfer (SST) method;
* backup strategies
(video:http://galeracluster.com/videos/2159/)
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
Since the dawn of time, Linux has had to make do with inferior IO interfaces. Native Linux AIO supports only a niche application class (O_DIRECT), and even for that use case, it’s far too slow for modern storage. This talk will detail io_uring, a modern IO interface for Linux, that’s both fully featured and performant.
Jens Axboe
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
Igor Anishchenko
Odessa Java TechTalks
Lohika - May, 2012
Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.
Designing how to implement aggregates in a distributed database is a non-trivial task. When dealing with aggregates that will be polling the entire cluster, it is important to consider performance impacts. If done poorly, full table scans can bring production systems to their knees. So how can you implement aggregate functions without hammering real-time availability and performance for other read/write operations? Learn how distributed aggregates were implemented in ScyllaDB to balance performance across large NoSQL distributed database clusters.
Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...confluent
Apache Kafka is a scalable streaming platform with built-in dynamic client scaling. The elastic scale-in/scale-out feature leverages Kafka’s “rebalance protocol” that was designed in the 0.9 release and improved ever since then. The original design aims for on-prem deployments of stateless clients. However, it does not always align with modern deployment tools like Kubernetes and stateful stream processing clients, like Kafka Streams. Those shortcoming lead to two mayor recent improvement proposals, namely static group membership and incremental rebalancing (which will hopefully be available in version 2.3). This talk provides a deep dive into the details of the rebalance protocol, starting from its original design in version 0.9 up to the latest improvements and future work. We discuss internal technical details, pros and cons of the existing approaches, and explain how you configure your client correctly for your use case. Additionally, we discuss configuration tradeoffs for stateless, stateful, on-prem, and containerized deployments.
Best practices for MySQL High AvailabilityColin Charles
The MariaDB/MySQL world is full of tradeoffs, and choosing a high availability (HA) solution is no exception. This session aims to look at all the alternatives in an unbiased way. Preference is of course only given to open source solutions.
How do you choose between: asynchronous/semi-synchronous/synchronous replication, MHA (MySQL high availability tools), DRBD, Tungsten Replicator, or Galera Cluster? Do you integrate Pacemaker and Heartbeat like Percona Replication Manager? The cloud brings even more fun, especially if you are dealing with a hybrid cloud and must think about geographical redundancy.
What about newer solutions like using Consul for MySQL HA?
When you’ve decided on your solution, how do you provision and monitor these solutions?
This and more will be covered in a walkthrough of MySQL HA options and when to apply them.
MySQL and PostgreSQL are the two most popular open-source relational databases. To help choosing between them, a comparison of their query optimizers has been carried out. The aim of this session is to summarize the outcome of the comparison. Specifically, to point out optimizer-related strengths and weaknesses.
Inside Cassandra – C* is an interesting piece of software for many reasons, but it is especially interesting in its use of elegant data structures and algorithms. This talk will focus on the data structures and algorithms that make C* such a scalable and performant database. We will walk along the write, read and delete paths exploring the low-level details of how each of these operations work. We will also explore some of the background processes that maintain availability and performance. The goal of this talk is to gain a deeper understanding of C* by exploring the low-level details of its implementation.
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
Beyond the immediate schema changes supported in Scylla Open Source 5.0, learn how the Raft consensus infrastructure will enable radical new capabilities. Discover how it will enable more dynamic topology changes, tablets, immediate consistency, better and faster elasticity, and simplification to repair operations.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
3 Things to Learn About:
-How Kudu is able to fill the analytic gap between HDFS and Apache HBase
-The trade-offs between real-time transactional access and fast analytic performance
-How Kudu provides an option to achieve fast scans and random access from a single API
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale.
In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.
In the first part of Galera Cluster best practices series, we will discuss the following topics:
* ongoing monitoring of the cluster and detection of bottlenecks;
* fine-tuning the configuration based on the actual database workload;
* selecting the optimal State Snapshot Transfer (SST) method;
* backup strategies
(video:http://galeracluster.com/videos/2159/)
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
Since the dawn of time, Linux has had to make do with inferior IO interfaces. Native Linux AIO supports only a niche application class (O_DIRECT), and even for that use case, it’s far too slow for modern storage. This talk will detail io_uring, a modern IO interface for Linux, that’s both fully featured and performant.
Jens Axboe
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Archaic database technologies just don't scale under the always on, distributed demands of modern IOT, mobile and web applications. We'll start this Intro to Cassandra by discussing how its approach is different and why so many awesome companies have migrated from the cold clutches of the relational world into the warm embrace of peer to peer architecture. After this high-level opening discussion, we'll briefly unpack the following:
• Cassandra's internal architecture and distribution model
• Cassandra's Data Model
• Reads and Writes
Modeling Data and Queries for Wide Column NoSQLScyllaDB
Discover how to model data for wide column databases such as ScyllaDB and Apache Cassandra. Contrast the differerence from traditional RDBMS data modeling, going from a normalized “schema first” design to a denormalized “query first” design. Plus how to use advanced features like secondary indexes and materialized views to use the same base table to get the answers you need.
Cassandra is a highly scalable, eventually consistent, distributed, structured columnfamily store with no single points of failure, initially open-sourced by Facebook and now part of the Apache Incubator. These slides are from Jonathan Ellis's OSCON 09 talk: http://en.oreilly.com/oscon2009/public/schedule/detail/7975
How to leave the ORM at home and write SQLMariaDB plc
Looking to understand the basics of relational databases and the ubiquitous structured query language (SQL)? This is the session for you. Senior Software Engineer Assen Totin starts with an introduction to relational database theory and quickly moves to practical examples of SQL with simple, single-table selects, joins, and aggregates.
Cassandra Day Denver 2014: Introduction to Apache CassandraDataStax Academy
Speaker: Jon Haddad, Technical Evangelist for Apache Cassandra at DataStax
This is a crash course introduction to Cassandra. You'll step away understanding how it's possible to to utilize this distributed database to achieve high availability across multiple data centers, scale out as your needs grow, and not be woken up at 3am just because a server failed. We'll cover the basics of data modeling with CQL, and understand how that data is stored on disk. We'll wrap things up by setting up Cassandra locally, so bring your laptops!
Optimizing NoSQL Performance Through ObservabilityScyllaDB
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor!
Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track.
This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example:
- Common issues getting up and running with the monitoring stack
- Using the CQL optimizations dashboard
- Common issues causing high latency in a node
- Common issues causing replica imbalance
- What a healthy system looks like in terms of memory
- Key metrics to keep an eye on
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
Discuss the core tradeoffs and considerations involved in order-free and ordered stream processing. Brian Taylor walks through the pros and cons of three different approaches: no data dependency, deferred inter-event data dependency, and streaming inter-event data dependency.
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
We start by setting up a common ground introducing why relational databases fall short, addressing common EDA characteristics such as the need for real-time response times and schemaless approaches to address recurring changes to adapt and on-board new use cases. Next, interact with a sample Rust-based application: a social network app demonstrating an integration of both ScyllaDB and Redpanda.
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
Discover how to avoid common pitfalls when shifting to an event-driven architecture (EDA) in order to boost system recovery and scalability. We cover Kafka Schema Registry, in-broker transformations, event sourcing, and more.
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
Our first webinar of this series will cover common mistakes with practices such as:
- Translating the data model to NoSQL
- Optimizing table design
- Optimizing query performance
- Planning for partitioning
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
Expert tips on how to maximize your database performance at scale
Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies.
In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams.
We’ll cover how to:
- Design and deploy a large-scale distributed database cluster
- Optimize your clients’ interactions with it
- Expand the cluster horizontally and globally
- Ensure it survives whatever disasters the world throws at it
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll:
- Examine the context and technical requirements
- Talk about potential solutions and cover the pros and cons of each
- Disclose what approach the team took, and how it worked out
About the speaker:
Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Navigating Complex Database Performance Hurdles
Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments.
Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma:
- The presenters will describe the context and technical requirements
- Together, we’ll talk about potential solutions and cover the pros and cons of each
- Finally, we’ll disclose what approach the team took, and how it worked out
Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
Felipe Cardeneti Mendes, Solutions Architect at ScyllaDB
Navigating workload-specific performance challenges and tradeoffs.
Felipe Mendes covers how to navigate the top performance challenges and tradeoffs that you’re likely to face with your project’s specific workload characteristics and technical/business requirements.
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
Pavel Emelyanov, Principal Engineer at ScyllaDB
Botond Denes, C++ Developer at ScyllaDB
What performance-minded engineers need to know.
Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
Piotr Sarna, Software Engineer at Turso
Understanding and tapping your driver’s performance potential.
Piotr Sarna discusses how to get the most out of a driver, particularly from the performance perspective, and select a driver that’s a good fit for your needs.
Technical risks of putting a cache in front of your database– and what to do instead
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching ia a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of ScyllaDB's specialized row-based cache
- Real-world examples of why and how teams eliminated their external cache with ScyllaDB
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate.
Join us to learn:
- Why and how to ensure your database takes full advantage of your cloud infrastructure
- What architectural considerations matter most for high throughput and low latency
- Key factors to consider when selecting a high-performance database
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture.
Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover:
- Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache)
- 7 specific reasons why external caching can be a bad choice
- Why Linux’s default caching doesn’t work well for databases
- The advantages & architecture of specialized row-based caches
- Real-world examples of why and how teams eliminated their external cache
Expert tips on how to maximize your database potential
If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case?
This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to:
- Infrastructure selection
- ScyllaDB configuration
- Client-side setup
- Data modeling
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
In this talk, Lubos discusses tools and methods for a successful migration. He covers:
Methods
Data (re)modeling
APIs
Spark Migrator
DS bulk
Tuning
Testing/monitoring
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
In this talk, Jon discusses practical strategies and issues to consider. He covers:
Reasons for Migrations
DB Functionality
Cost/Licensing
Outdated Technology
Scaling Problems
Technology Evolution
SQL to NoSQL
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
2. Konstantin Osipov
■ Worked on lightweight transactions in Scylla
■ Crazy about distributed system testing
■ Lives in Moscow, father of two
Director of Engineering, ScyllaDB
4. Scylla schema features
You can:
CREATE or DROP
● KEYSPACE, TABLE,
● VIEW, INDEX,
CHANGE OPTIONS:
● of replication, table, view, index options
You can’t:
MODIFY
● RENAME keyspace, table, column
● CHANGE column type
CONSTRAINT:
● UNIQUE, CHECK, FOREIGN
5. Consistency Model of Schema Changes
id first last
1 John Doe
Time
Node A: Node B:
id first last email
1 John Doe
2 Jenny Smith j@...
id first last email phone
1 John Doe
2 Jenny Smith j@... (867)
id first last phone
1 John Doe
2 Jenny Smith (867)
Split
brain
6. (In)consistency of Schema Changes
cqlsh:test> create table t (a int primary key);
----------------------------------------------- split ------------------------------------------
cqlsh:test> alter table t rename a to d;
Warning: schema version mismatch detected
cqlsh:test> insert into t (d) values (1);
Cannot execute this query as it might involve data filtering and thus
may have unpredictable performance.
cqlsh:test> insert into t (a) values (1);
Unknown identifier a
7. Schema changes need strong consistency
In Cassandra… In Scylla…
CASSANDRA-10250,
CASSANDRA-10699,
CASSANDRA-14957,
…
#2921, #4648,
#6455, #7426,
#8968, #9774 …
9. Raft intro
Raft is a protocol for state machine replication.
What does it mean?
■ The majority of nodes have the same state
■ State transition happens in the same order on all
nodes
■ Cluster topology is part of the state
10. How Raft achieves consistency
Consensus
module
State
machine
Log
x←1 y←2 z←3
Consensus
module
State
machine
Log
x←1 y←2 z←3
Consensus
module
State
machine
Log
x←1 y←2 z←3
Node A Node B Node C
13. How Scylla Raft is special
Scylla Raft implements a number of important extensions:
■ Increased liveness for very large clusters (1000+ nodes)
■ Resilience against asymmetric network failures
■ Read and write support on all cluster nodes
■ Efficient multi-raft: every node can replicate many state machines
14. Setting up a fresh cluster
On a fresh start, Scylla node:
■ Generates and persists unique random Server ID (UUID)
■ Contacts all known peers. Strictly after:
• contacting all peers in seeds: list
• exchanging all known Server IDs
• AND not finding an existing cluster
• AND if this Server ID is lexicographically the smallest
■ Creates a new Raft Group ID and a new cluster
17. How Scylla changes on Raft work?
To execute a DDL statement, the server:
■ Takes Raft read barrier
■ Reads the latest schema and validates CQL
■ Builds Raft command and signs it with old and new schema id
■ Once command is committed, it’s applied only if old schema id is the same
■ Retries if commit or apply failed
19. Solved issues
■ Concurrent DDL is now safe
■ still under --experimental-features-raft
■ Enabled if all nodes are 5.0
20. Introduced issues
Raft prefers CONSISTENCY over AVAILABILITY. What does it mean?
■ 2-data center set ups become more fragile
■ Prefer odd number of DCs to avoid split brain
■ Import sstables into a new cluster if permanent loss of majority
■ 5.0 cluster with Raft can’t downgrade to 4.x
22. Thank you!
Stay in touch
Konstantin Osipov
@kostja_osipov
kostja@scylladb.com
Editor's Notes
Hi I’m Kostja and today I’m going to talk about schema changes on RAft.
The easiest way to think about database schema comes from the world of relations where it’s a collection of headers of relational tables. Each column has a column name and a data type, meaning all cells in this column conform to the constraints of this type. The need for a schema comes from desire to save on storage, specifically avoid storing the same name and type information in each cell, as well as make relational algebra possible. Join conditions are based on cell equality, something difficult to define for cells with different type.
Just like relational databases, Scylla requires column types and enforces type constraints. It also supports complex types, such sets, maps, lists and user defined types. This makes its data representation more compact, compared to document databases, but does require the clients to think through their schemas when designing an application.A less commonly considered part of database schema are replication and storage properties, indexes, views, and access rights. Finally, there is the power of data definition verbs to change the existing schema. Is could be possible to add or drop columns, change column types, add unique, check and foreign key constraints.
Replicating schema statements across nodes has to use its own path, as it impacts all nodes in the cluster and requires coordinated error handling - you can’t fail an operation on one node while succeeding it on another.Some data definition operations require a complete scan or even a rebuild of data in the table. E.g. building a unique index must check that the table has no duplicates in the given column. Changing column type may require converting each cell from one physical representation to another.
Scylla currently avoids rewriting data in schema change operations. Instead, it transforms data on the fly, to make it conform with the client schema. For example, if the current schema doesn’t contain a column but the actual row stil stores it, the column is removed from the row before it’s to the user.The advantage of this approach is that existing schema change statements are lightweight and less error prone. Some complex data transformations, however, have to be done on the client, which has to concern themselves with the operation consistency.
In a distributed environment each node can have a slightly different version of schema. To be able to return consistent results in this environments, Scylla signs data retrieval operations passed between nodes with the schema version. The receiving node must make sure that the returned data is in the format required by the client. If it is a newer format that this node is not yet aware of, it will request the schema information from the sender.
Let’s recap what you can and can’t do in Scylla:You can: create tables, views, udts, add or drop columns, truncate data, define roles and access rights.You can’t: change column types (you can do compatible changes, like text to blob), have unique secondary keys, triggers or constraints, rename a column
Some of the unsupported operations can’t be implemented in an eventually consistent environment. It would be difficult to support unique secondary keys unless the definition of uniqueness is the same on all nodes - a duplicate may slip through the cracks when a node is being added and data is being moved from one node to another.Being a multi-consistency-model database, it’s not impossible that Scylla will eventually get these advanced schema features - so we’d better begin building a foundation for them now.
Questions to the audience:
How many people would like to have unique secondary keys?
How many people use materialized views?
How many people are unhappy with materialized views in Scylla? Why?
To see why Scylla supports some schema changes and not others, let’s consider how schema changes may fail and how,Scylla recovers from such a failure.
One obvious kind of failure is a down node. Schema changes are allowed to proceed even if all but one node in the cluster are down. What happens then? When nodes get up, they’ll learn through gossip that cluster has received a schema update and fetch it from the node that has it. But what if the node or nodes is not down but is partitioned away?
It’s possible that two subsets of the clusters operate using a different version of schema. Eventually when partitioning heals, nodes learn about the changes and run compaction their data will be converted to the most recent format.
A more complex scenario is when two concurrent changes conflict. E.g. imagine one part of the cluster adds a column to a table, and another part adds a different column with the same definition.The schema with the most recent timestamp is going to win but if the column received any updates corresponding to the “shadowed” definition of the schema they will be lost after schema reconciliation.There are many other ways in which eventual schema consistency may backfire:
You may not see the table you just created. “Schema agreement” happens via gossip, so can easily take seconds. So if the node receiving a write is not the same node which received the schema change, it does not learn about it immediately
Clients may get incorrect errors since schema changes use local node’s view of the schema, which might not be fully up to date.
there is no prevention against duplicate attempts to create any object, not just columns e.g. keyspace, table, view, index. Both concurrent attempts may succeed, with an object which has a newer creation timestamp shadowing the older one with the same name, with its entire contents.
Not all diversions in schema can be merged with eventual consistency rules. Dropping a user defined type when schema relies on it renders the table unusable, the contents of the table becomes inaccessible.
Concurrent changes to the same object can be lost, like some added columns in cassandra issue 10250.
Interaction of schema changes and topology changes produces another bouquet of gotchas:
Changing keyspace replication factor does not take immediate effect and if done during topology change can lead to a data loss.
A failure during topology change can violate LWT linearizability
Writes may be lost during topology changes
To sum up, the only (dubious) advantage of the algorithm is total liveness - the cluster is able to make progress with data definition verbs even in presence of a majority failure. The client is responsible for making schema changes always through the same node, thus manually enforcing linearizability. Schema reconciliation algorithms are not a feature, as their specific behaviour is not documented. Rather, it’s a best effort to patch up for otherwise undefined behaviour.
The problem is acknowledged in Cassandra and Scylla communities as far back as in 2015, with duplicates constantly piling up. Strongly consistent features of Scylla such as LWT and upcoming tablets aren’t really consistent unless DDL and topology changes are consistent as well.
Welcome Raft, the base algorithm used for strong consistency in Scylla 5.0. Let’s talk about it in more detail.
Raft is often a called a protocol of state machine replication. A database is a kind of a state machine, and replicating a database is having the same copy of data on every node. For schema, the replicated state is keyspace, table and view definitions. By means of Raft we can make sure each cluster node not just has the same copy of the data, but applies all state changes - that is, data definition commands - in the same order. Moreover, if nodes restart, join or leave, the order must stay the same. System liveness must be preserved as long as the majority of the cluster is up.Handling of node failures, joining and leaving as part of the protocol for data replication was the primary reason Scylla settled on using Raft for schema consistency. Let’s discuss it in more detail.
For a deep dive into Raft, I recommend “Raft study” - a video lecture by John Ousterhout, as well as Raft PhD - key chapters are are 1 through 4. If you’re looking into writing an own implementation, having studied many, I encourage you to look into Scylla Raft - in my opinion, it is highly isolated, commented and carefully tested.
So what’s the idea of a replicated state machine? Suppose you had a program or an application that you wanted to make reliable. One way to do that is to execute that program on a collection of machines and ensure they execute it in exactly the same way. So a state machine is just a program or an application that takes inputs and produces outputs.
A replicated log can help to make sure that these state machines execute exactly the same commands. Here’s how it works. A client of the system that wants to execute a command passes it to one of these machines. That command, let’s call it X then gets recorded in the log of the local machine, and then, in addition, the command is passed to the other machines and recorded in their logs as well. Once the command has been safely replicated in the logs, then it can be passed to the state machines for execution. And when one of the state machines is finished executing the command, the result can be returned back to the client program. And you can see that as long as the logs on the state machines are identical, and the state machines execute the commands in the same order, we know they are going to produce the same results. So it’s the job of the consensus module to ensure that the command is replicated and then pass it to the state machine for execution.
The system makes progress as long as any majority of the servers are up and can communicate with each other. (2 out of 3, 3 out of 5).
In Raft, the servers are not equal at any given point in time.Clients communicate with the leader, and the leader communicates with other servers to replicate commands.
This decomposes the problem of consensus algorithm into:
Normal operation, when there is a running leader
what you do when a leader crashes and one needs to elect a new leader.
Being an otherwise a-symmetric, leader-based algorithm, Raft falls back to symmetry for leader election. Any follower that doesn’t get updates from the leader for the duration of an election timeout is able to become a candidate and request others to vote for itself. The candidate which gets a majority of votes declares itself the leader and begins replicating logs to all other members of the cluster.Split votes, i.e. situations when no candidate gets a majority of votes to become a leader are possible. In November 2020 Cloudflare recorded a few hour outage due to of a prolonged failure to elect a leader in presence of an asymmetric network failure. Packets were routed from one part of network to the other, but not back. Nodes would repeatedly time out, request votes, this would upset the existing leader.Newer versions of Raft implement a special extension, called pre-voting, mandating fresh candidates to dry-run an election before starting a real one and disrupting the existing leader. Nodes which get pings from the current leader vote negatively during pre-vote. If pre-voting does not collect a majority of votes, the candidate doesn’t start the real election. Scylla has pre-voting implemented and always on. Turns out, it allows to simplify other parts of Raft, specifically drop Raft rules related to sticky leadership. This was made possible thanks to our decision to implement Raft from scratch on top of Seastar, and not adopt an existing implementation.
Another important reason Raft is so valuable for Scylla is that cluster configuration, or topology changes, are part of Raft core.In order to add or remove a node in Raft, the client applies a special command to the log, and once its replicated to the current majority, a new majority is formed. Scylla uses an extended, two-step configuration change procedure, allowing it to add or remove more than one node at a time, or add and a remove nodes in a single configuration change. This allows us replace a node without risk to render the cluster unusable if replace fails.Another advanced feature of Scylla is being able to add a non-voting members to the cluster. A non-voting member acts as a normal Raft node but it can neither vote nor get elected. In Scylla, new nodes join the cluster as non-voting members, thus a join failure doesn’t impact Raft quorum (the rules for determining majority and thus making progress). A node that failed to join can be easily removed even if some nodes in the cluster are down. Once a node has completed advertising tokens and transferring ranges, it becomes a full voting member of the ring.
There are other ways in which Scylla Raft is special. Given Scylla clusters can be quite large, we paid special attention to make sure elections are swift. Scylla randomizes each node’s election timeout (the interval after which the node starts an election if it doesn’t hear from the current leader), and spreads it proportionally to the cluster size. Even in a 1000 node cluster each node will start election roughly in its own time, allowing it to request votes from followers without interference from other candidates. Thanks to this, typical time to elect a new leader is one to three seconds.
We coded support for Raft read barriers and automatic forwarding of commands to the leader. This made Scylla Raft more symmetric, allowing followers to execute major parts of schema changes, offloading the leader which only needs to log the mutations - changes of system tables rows.Finally, we greatly reduced the cost of failure detection, allowing multiple raft instances (we call these raft groups) on a node share a single failure detector. We plan to run an own mini-Raft cluster (a replica set) for each tablet, so reduced failure detection overhead a single replica set incurs as much as we could.
Scylla Raft implementation is isolated from disk and network, which allowed us to test it extensively without having to create large clusters. A minute-long tests runs hundreds of thousands of configuration events alongside injected network splits, node failures and packet drops.
Raft addresses many issues, but provides only basic initial cluster setup. In Scylla it’s long been a rule that nodes need to be added to the cluster one at a time and operator has to wait for the joining node to advertise its token and complete streaming before joining the next node. Apart from slowing down the overall join procedure, limiting the elasticity of the cluster, this procedure is also inherently unsafe: there were no safety net for operator error, for example, starting two new nodes could lead to some ranges being replicated incorrectly.Joining relied on gossip to propagate tokens, which added constant unpleasant delays to the procedure.With transition to Raft, we wanted to address this problem as well. For it, we introduced a new protocol for cluster assembly, which we called “cluster discovery”. The idea of the protocol is as follows:If a joining node is able to contact with any cluster node which already has an existing Raft group, it will use that group configuration machinery to join itself.But if we start several fresh nodes, there is no cluster yet. In that case, each new node continuously discovers its peers until it can build a complete map of the cluster and as soon as it is able to do it, it may start a new Raft group.
Consider a fresh ring of 5 nodes which haven’t formed a cluster yet. Imagine each node in the ring, has information about initial contact points in its configuration file, seeds section. The node contacts initial peers, and finds out there are other nodes in the cluster. It continues doing so until no new members appear and all existing members responded. Then a new raft configuration can be formed.Distributed protocols are not considered safe unless proved correct. To validate correctness of the new Scylla discovery protocol we created a TLA+ specification and ran it to completeness for all reasonable cluster sizes.In 5.0 where we must support pre-raft nodes, the discovery protocol lives side by side with gossip. In future, we’ll be able to switch and make scylla cluster boot take subsecond time.
With such a strong foundation, implementing schema changes on Raft was a breeze.
As you perhaps know, Scylla stores all schema definitions in system tables. Each node has a copy, and the node which received the mutation propagates it to all other nodes.
So the biggest change in 5.0 is that this propagation of the change is no longer eventual but happens through raft.
All cluster nodes become part of a raft group we call internally “group 0”, since its the first group ever. The group has a single leader which is actively pushing all changes to all members. Any member willing to make a change forwards it to the leader, which commits it on the majority before materializing as a new schema version.
If a node is disconnected from the leader, or disconnected from the majority, it can no longer make a schema change.
For a connected node, the steps are as follows:
Before a node executes the command, it reads the latest schema, issuing a Raft read barrier. Imagine you want to drop a table. You need to make sure the table exists, and if it doesn’t return an error to the client. Similar validations happen for all CQL commands, and as a result a change to system schema is built, which is recorded in the raft log.But what if two nodes try to do the changes at the same time? Both of them may be able to record their changes in the log. Or, if a connection to the leader is flickering, we may end up in a state of uncertainty, when the command has executed, but the caller doesn’t know the outcome.To protect against double execution or execution in a wrong order, each command is signed with old and new schema id. When the command is applied, the current state of the schema must match old id, and the new id is recorded in schema version history, so that newer commands may fail. In cases of uncertainty change id is used to validate that the command is indeed applied to the state machine.
If a race is detected, application of the command turns into a no op and the entire procedure is restarted.
Did switch to Raft have any impact on CRUD performance or availability?The way insert, update, or delete works with the schema in 5.0 is similar to 4.0. The coordinator signs the mutation with its schema version. If a mutation with an older version arrives to a node with a newer schema, it is automatically converted to the newer version. If a mutation with a newer version arrives to a node with an older schema, that node will fetch its schema from the coordinator. Schema fetch doesn’t need a leader to be around and can happen from any peer. This preserves DML availability guarantees during Raft leader changes or a network partition. Raft leader is constantly pushing schema updates to all nodes, so the cases for outdated schema should be much more rare.
Conflicts between DDL and topology are still resolved eventually. Specifically, changes to keyspace replication factor still take place without actual data replication, and when adding a node we still use gossip to wait for “schema agreement”.
Let’s recap.
Starting from Scylla 5.0, concurrent DDL is safe. Anomalies such as spurious errors, shadowed keyspaces, tables or columns are impossible. Schema propagation happens much faster, making it easier to write day to day applications.The features still requires an experimental switch, and once is enabled, there is no downgrade path. We’re actively working on weeding out all the bugs and turning off experimental, which we hope to do later this year.
Heterogeneous cluster continue to work, but will use old gossip-style communication to propagate schema. Starting from the next major release Raft schema management will become the default, making the problems we discuss today a strict legacy.
Not all changes introduced by Raft are rosy. There are cases when raft preference of consistency over availability may impact production deployments.
Raft preserves liveness as long as you have a majority of nodes. A split brain in a two DC setup is one notable case when majority can be permanently lost. Scylla 5.0 with Raft will not admit any DDL statements in case of split brain, while DML will continue to work.
We’re looking into introducing a nodetool command to promote such isolated cluster into a new one, however, if network split is temporary, this would be a wrong answer to the problem.Question: how much of an impact this is to you?
Use of Raft doesn’t stop with schema changes. I welcome you to attend a talk by our distinguished engineer, Tomash Grabiec, to learn more about our future plans.
Thanks! Thanks for attending, this was a session about schema changes on Raft in Scylla 5.0.