Scylla Summit 2022: Making Schema Changes Safe with Raft

How to be Successful with Scylla

Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.

InfluxDB IOx Tech Talks: Replication, Durability and Subscriptions in InfluxD...

InfluxData

Scylla Summit 2022: Scylla 5.0 New Features, Part 1

Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics: - New IO Scheduler and Disk Parallelism - Per-Service-Level Timeouts - Better Workload Estimation for Backpressure and Out-of-Memory Conditions - Large Partition Handling Improvements - Optimizing Reverse Queries To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

Solving PostgreSQL wicked problems

Alexander Korotkov

Scylla on Kubernetes: Introducing the Scylla Operator

How can Kubernetes be best used to automate the deployment, scaling, and various operations of a Scylla database? Enter Kubernetes Operators, the way to combine domain-specific knowledge about Scylla with the automation framework of Kubernetes. In this presentation, we will quickly explore what Kubernetes is and why it works so well, highlight the pain points of running Scylla with just Kubernetes primitives, and show how we extended Kubernetes so that it can correctly operate a Scylla database. Finally, we will show the Scylla Operator in action and show how easily you can spin up a Scylla cluster with just one command.

The PostgreSQL Query Planner

Command Prompt., Inc

Robert Haas Why does my query need a plan? Sequential scan vs. index scan. Join strategies. Join reordering. Joins you can't reorder. Join removal. Aggregates and DISTINCT. Using EXPLAIN. Row count and cost estimation. Things the query planner doesn't understand. Other ways the planner can fail. Parameters you can tune. Things that are nearly always slow. Redesigning your schema. Upcoming features and future work.

Igor Anishchenko Odessa Java TechTalks Lohika - May, 2012 Let's take a step back and compare data serialization formats, of which there are plenty. What are the key differences between Apache Thrift, Google Protocol Buffers and Apache Avro. Which is "The Best"? Truth of the matter is, they are all very good and each has its own strong points. Hence, the answer is as much of a personal choice, as well as understanding of the historical context for each, and correctly identifying your own, individual requirements.

Implementing Highly Performant Distributed Aggregates

Designing how to implement aggregates in a distributed database is a non-trivial task. When dealing with aggregates that will be polling the entire cluster, it is important to consider performance impacts. If done poorly, full table scans can bring production systems to their knees. So how can you implement aggregate functions without hammering real-time availability and performance for other read/write operations? Learn how distributed aggregates were implemented in ScyllaDB to balance performance across large NoSQL distributed database clusters.

Play with FILE Structure - Yet Another Binary Exploit Technique

Angel Boy

Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...

Apache Kafka is a scalable streaming platform with built-in dynamic client scaling. The elastic scale-in/scale-out feature leverages Kafka’s “rebalance protocol” that was designed in the 0.9 release and improved ever since then. The original design aims for on-prem deployments of stateless clients. However, it does not always align with modern deployment tools like Kubernetes and stateful stream processing clients, like Kafka Streams. Those shortcoming lead to two mayor recent improvement proposals, namely static group membership and incremental rebalancing (which will hopefully be available in version 2.3). This talk provides a deep dive into the details of the rebalance protocol, starting from its original design in version 0.9 up to the latest improvements and future work. We discuss internal technical details, pros and cons of the existing approaches, and explain how you configure your client correctly for your use case. Additionally, we discuss configuration tradeoffs for stateless, stateful, on-prem, and containerized deployments.

Galera explained 3

Marco Tusa

Best practices for MySQL High Availability

Colin Charles

The MariaDB/MySQL world is full of tradeoffs, and choosing a high availability (HA) solution is no exception. This session aims to look at all the alternatives in an unbiased way. Preference is of course only given to open source solutions. How do you choose between: asynchronous/semi-synchronous/synchronous replication, MHA (MySQL high availability tools), DRBD, Tungsten Replicator, or Galera Cluster? Do you integrate Pacemaker and Heartbeat like Percona Replication Manager? The cloud brings even more fun, especially if you are dealing with a hybrid cloud and must think about geographical redundancy. What about newer solutions like using Consul for MySQL HA? When you’ve decided on your solution, how do you provision and monitor these solutions? This and more will be covered in a walkthrough of MySQL HA options and when to apply them.

My sql failover test using orchestrator

YoungHeon (Roy) Kim

Query Optimizer – MySQL vs. PostgreSQL

Christian Antognini

A Deep Dive Into Understanding Apache Cassandra

Inside Cassandra – C* is an interesting piece of software for many reasons, but it is especially interesting in its use of elegant data structures and algorithms. This talk will focus on the data structures and algorithms that make C* such a scalable and performant database. We will walk along the write, read and delete paths exploring the low-level details of how each of these operations work. We will also explore some of the background processes that maintain availability and performance. The goal of this talk is to gain a deeper understanding of C* by exploring the low-level details of its implementation.

Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond

Beyond the immediate schema changes supported in Scylla Open Source 5.0, learn how the Raft consensus infrastructure will enable radical new capabilities. Discover how it will enable more dynamic topology changes, tablets, immediate consistency, better and faster elasticity, and simplification to repair operations. To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.

Maria db 이중화구성_고민하기

Apache Kudu: Technical Deep Dive  

Cloudera, Inc.

MySQL/MariaDB Proxy Software Test

I Goo Lee

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

The Hive

This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.

RocksDB Performance and Reliability Practices

Yoshinori Matsunobu

Meta/Facebook's database serving social workloads is running on top of MyRocks (MySQL on RocksDB). This means our performance and reliability depends a lot on RocksDB. Not just MyRocks, but also we have other important systems running on top of RocksDB. We have learned many lessons from operating and debugging RocksDB at scale. In this session, we will offer an overview of RocksDB, key differences from InnoDB, and share a few interesting lessons learned from production.

MariaDB MaxScale monitor 매뉴얼

Codership Oy - Creators of Galera Cluster

1. About MariaDB MaxScale 2. Common Monitor Parameters 1) Parameters 2) Script events 3) Monitor Crash Safety 4) Script example 3. MariaDB Monitor 1) Master selection 2) Configuration 3) MariaDB Monitor optional parameters 4) Cluster manipulation operations - Operation details - Manual activation - Automatic activation - Limitations and requirements - External master support - Configuration parameters 5) Cooperative monitoring - Releasing locks 6) Troubleshooting - Failover / switchover fails - Slave detection shows external masters 7) Using the MariaDB Monitor With Binlogrouter 4. Galera Monitor 1) Configuration 2) Galera Monitor optional parameters 3) Interation with Server Priorities 5. ColumnStore Monitor 1) Required Grants 2) Master Selection 3) Configuration 4) Commands 4) Example - Adding a Node - Removing a Node 6. Automatic Failover With MariaDB Monitor 1) Manual Failover 2) Automatic Failover 3) Rejoin 4) Switchover

Galera Cluster Best Practices for DBA's and DevOps Part 1

Real-time Data Streaming from Oracle to Apache Kafka

Kernel Recipes 2019 - Faster IO through io_uring

Anne Nicolas

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

Deep Dive into Cassandra

Brent Theisen

What's hot

RocksDB compaction

MIJIN AN

Thrift vs Protocol Buffers vs Avro - Biased Comparison

Igor Anishchenko

Implementing Highly Performant Distributed Aggregates

Play with FILE Structure - Yet Another Binary Exploit Technique

Angel Boy

Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...

Galera explained 3

Marco Tusa

Best practices for MySQL High Availability

Colin Charles

My sql failover test using orchestrator

YoungHeon (Roy) Kim

Query Optimizer – MySQL vs. PostgreSQL

Christian Antognini

A Deep Dive Into Understanding Apache Cassandra

Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond

Maria db 이중화구성_고민하기

Apache Kudu: Technical Deep Dive  

Cloudera, Inc.

MySQL/MariaDB Proxy Software Test

I Goo Lee

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

The Hive

RocksDB Performance and Reliability Practices

Yoshinori Matsunobu

MariaDB MaxScale monitor 매뉴얼

Codership Oy - Creators of Galera Cluster

Galera Cluster Best Practices for DBA's and DevOps Part 1

Real-time Data Streaming from Oracle to Apache Kafka

Kernel Recipes 2019 - Faster IO through io_uring

Anne Nicolas

What's hot (20)

RocksDB compaction

Thrift vs Protocol Buffers vs Avro - Biased Comparison

Implementing Highly Performant Distributed Aggregates

Play with FILE Structure - Yet Another Binary Exploit Technique

Everything You Always Wanted to Know About Kafka’s Rebalance Protocol but Wer...

Galera explained 3

Best practices for MySQL High Availability

My sql failover test using orchestrator

Query Optimizer – MySQL vs. PostgreSQL

A Deep Dive Into Understanding Apache Cassandra

Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond

Maria db 이중화구성_고민하기

Apache Kudu: Technical Deep Dive  

MySQL/MariaDB Proxy Software Test

Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook

RocksDB Performance and Reliability Practices

MariaDB MaxScale monitor 매뉴얼

Galera Cluster Best Practices for DBA's and DevOps Part 1

Real-time Data Streaming from Oracle to Apache Kafka

Kernel Recipes 2019 - Faster IO through io_uring

Similar to Scylla Summit 2022: Making Schema Changes Safe with Raft

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

Deep Dive into Cassandra

Brent Theisen

CassandraRobert Koletka

Apache Cassandra at the Geek2Geek Berlin

Christian Johannsen

Intro to Cassandra

Archaic database technologies just don't scale under the always on, distributed demands of modern IOT, mobile and web applications. We'll start this Intro to Cassandra by discussing how its approach is different and why so many awesome companies have migrated from the cold clutches of the relational world into the warm embrace of peer to peer architecture. After this high-level opening discussion, we'll briefly unpack the following: • Cassandra's internal architecture and distribution model • Cassandra's Data Model • Reads and Writes

Modeling Data and Queries for Wide Column NoSQL

Cassandra: Open Source Bigtable + Dynamo

jbellis

How to leave the ORM at home and write SQL

MariaDB plc

Cassandra Talk: Austin JUG

Stu Hood

Intro to cassandra

Aaron Ploetz

Cassandra Day Denver 2014: Introduction to Apache Cassandra

Speaker: Jon Haddad, Technical Evangelist for Apache Cassandra at DataStax This is a crash course introduction to Cassandra. You'll step away understanding how it's possible to to utilize this distributed database to achieve high availability across multiple data centers, scale out as your needs grow, and not be woken up at 3am just because a server failed. We'll cover the basics of data modeling with CQL, and understand how that data is stored on disk. We'll wrap things up by setting up Cassandra locally, so bring your laptops!

Introduction to Cassandra - Denver

Jon Haddad

Scaling web applications with cassandra presentationMurat Çakal

Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Tim Callaghan

Introduction to Cassandrashimi_k

On Rails with Apache Cassandra

Stu Hood

Performance Tuning Best Practiceswebhostingguy

Oss4b - pxc introduction

Frederic Descamps

Chicago Kafka Meetup

Cliff Gilmore

NewSQL Database OverviewSteve Min

Similar to Scylla Summit 2022: Making Schema Changes Safe with Raft (20)

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

Deep Dive into Cassandra

Cassandra

Apache Cassandra at the Geek2Geek Berlin

Intro to Cassandra

Modeling Data and Queries for Wide Column NoSQL

Cassandra: Open Source Bigtable + Dynamo

How to leave the ORM at home and write SQL

Cassandra Talk: Austin JUG

Intro to cassandra

Cassandra Day Denver 2014: Introduction to Apache Cassandra

Introduction to Cassandra - Denver

Scaling web applications with cassandra presentation

Use Your MySQL Knowledge to Become an Instant Cassandra Guru

Introduction to Cassandra

On Rails with Apache Cassandra

Performance Tuning Best Practices

Oss4b - pxc introduction

Chicago Kafka Meetup

NewSQL Database Overview

More from ScyllaDB

Optimizing NoSQL Performance Through Observability

ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor! Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track. This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example: - Common issues getting up and running with the monitoring stack - Using the CQL optimizations dashboard - Common issues causing high latency in a node - Common issues causing replica imbalance - What a healthy system looks like in terms of memory - Key metrics to keep an eye on This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Event-Driven Architecture Masterclass: Challenges in Stream Processing

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Developer Data Modeling Mistakes: From Postgres to NoSQL

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example: - Understanding query first design principles - Planning for schema evolution - Steering clear of common pitfalls and anti-patterns - Assessing data access patterns This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

What Developers Need to Unlearn for High Performance NoSQL

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. Our first webinar of this series will cover common mistakes with practices such as: - Translating the data model to NoSQL - Optimizing table design - Optimizing query performance - Planning for partitioning This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Low Latency at Extreme Scale: Proven Practices & Pitfalls

Expert tips on how to maximize your database performance at scale Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies. In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams. We’ll cover how to: - Design and deploy a large-scale distributed database cluster - Optimize your clients’ interactions with it - Expand the cluster horizontally and globally - Ensure it survives whatever disasters the world throws at it

Dissecting Real-World Database Performance Dilemmas

Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll: - Examine the context and technical requirements - Talk about potential solutions and cover the pros and cons of each - Disclose what approach the team took, and how it worked out About the speaker: Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care? This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on: - The hidden aspects of linear scaling - When linear scaling matters most and when it’s simply irrelevant - Often overlooked considerations for optimizing and measuring distributed systems performance Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.

Dissecting Real-World Database Performance Dilemmas

Navigating Complex Database Performance Hurdles Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma: - The presenters will describe the context and technical requirements - Together, we’ll talk about potential solutions and cover the pros and cons of each - Finally, we’ll disclose what approach the team took, and how it worked out Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.

Database Performance at Scale Masterclass: Workload Characteristics by Felipe...

Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna

Replacing Your Cache with ScyllaDB

Technical risks of putting a cache in front of your database– and what to do instead Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching ia a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of ScyllaDB's specialized row-based cache - Real-world examples of why and how teams eliminated their external cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate. Join us to learn: - Why and how to ensure your database takes full advantage of your cloud infrastructure - What architectural considerations matter most for high throughput and low latency - Key factors to consider when selecting a high-performance database

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching can be a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of specialized row-based caches - Real-world examples of why and how teams eliminated their external cache

Getting the most out of ScyllaDB

Expert tips on how to maximize your database potential If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case? This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to: - Infrastructure selection - ScyllaDB configuration - Client-side setup - Data modeling

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges