Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes

•

0 likes•134 views

Pavel Emelyanov, Principal Engineer at ScyllaDB Botond Denes, C++ Developer at ScyllaDB What performance-minded engineers need to know. Hear from Pavel Emelyanov and Botond Dénes on the impact of database internals – specifically, what to look for if you need latency and/or throughput improvements.

Technology

Database Internals
Pavel Emelyanov & Botond Dénes

Hardware (and kernel) matters
■ CPU
■ Memory
■ Disk I/O
■ Networking
2

Share nothing
4
■ Cross-cores locking costs
○ Cache lines
○ CPU ticks
■ Amdahl’s law

Processes/Threads → Fibers
5
■ Context switch time matters
■ Enforced preemption switch also matters
■ Locking synchronization is a must

Quirks
6
■ Execution stages
■ Branch (mis)prediction

Allocation
8
■ Fragmentation
■ Slab-like allocation
■ Log-structured allocation

Cache control
9
■ Let kernel or application do it?
■ Caching can exist at different levels
○ Raw IO blocks
○ Decoded objects

Types of I/O
11
■ Buffered reads/writes
■ Memory mapped IO
■ Direct IO
■ Asynchronous direct IO
○ IOUring

FS vs Disk
12
■ Filesystem adds a level of manageability
■ Extra IO operations
○ Blocks allocation
○ Inodes metadata
○ Journal

How moderns SSDs work
13
■ Random reads – YES
■ Random writes – NO
○ Blocks vs Pages
○ Internal parallelism
○ FTL and background operations
■ Sustained vs Burst throughput
■ Mixed workload handling

Tribute to ANK
15
■ Linux kernel TCP/IP had been kicking arses
since day 0
■ Coupled with epoll/AIO
■ Can be tuned for both
○ RPC messages ping pong
○ Data streaming
■ System calls switches still matters

DPDK
16
■ Removes extra switches
■ Works better in poll mode

IRQ binding
17
■ NIC IRQ processing times
■ NIC Soft-IRQ processing times
■ Binding IRQs (and Soft-IRQs) to speciﬁc
cores

CPU
20
■ Thread-per core architecture
■ User-space scheduler

Memory
21
■ Row-cache
■ Page-cache
■ Full control over allocations

Disk I/O
22
■ Direct I/O
■ User-space scheduler

Optimizing for performance and reducing latency is a hard problem. Examples could be: choosing a different algorithm and data structures, improving SQL queries, adding a cache, serving requests asynchronously, or some low-level optimization that requires a deep understanding of the OS, kernel, compiler, or the network stack. The engineering effort is usually nontrivial, and only if you're lucky, you'll see some tangible results. That being said, there are some performance optimization techniques, with a few lines of code — even exist in the built-in library — it can lead to noticeable surprising results. One of these techniques is to "fail fast, retry soon". These techniques are often neglected or taken for granted. In distributed systems, a service or a database consists of a fleet of nodes that functions as one unit. It is not uncommon for some nodes to go down, usually, for a short time. When this occurs, failures can happen on the client-side and can lead to an outage. To build resilient systems, and reduce the probability of failure, we're going to explore these topics: timeouts, backoff, and jitter. We'll talk about timeouts, what timeout to set, pitfalls of retries, how backoff improves resource utilization, and jitters reduce congestion. Furthermore, we're going to see an adaptive mechanism to dynamically adjust these configurations. This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms after employing these three techniques: timeouts, backoff, and jitter. This is inspired by a real-production use case where DynamoDB latency p99 & max went down from > 10s to ~500ms. AWS articles, specifically M. Brooker’s writings, and SDKs code have been great resources to dive into these techniques: - Timeouts, retries and backoff with jitter in the AWS Builder's Library, 2019 (https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/) - Exponential Backoff and Jitter on the AWS Architecture Blog, 2016 (https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/) - Fixing retries with token buckets and circuit breakers, Marc's Blog, 2022 (https://brooker.co.za/blog/2022/02/28/retries.html)

Stability Patterns for Microservices

pflueras

P99 Pursuit: 8 Years of Battling P99 Latency

ScyllaDB

Performance engineering is a Sisyphean hill climb for perfection. Those who climb the hill are hardly ever satisfied with the results. You should always ask yourself where the bottleneck is today and what’s holding you back. Great performance improves your software. It enables you to run fewer layers, manage 10x less machines, simplifies your stack, and more. In this keynote session, ScyllaDB CEO Dor Laor will cover the principles for successful creation of projects like ScyllaDB, KVM, the Linux kernel and explain why they spurred his vision for the P99 CONF.

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...

HostedbyConfluent

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Ethan Guo | Current 2022 Back in 2016, Apache Hudi brought transactions, change capture on top of data lakes, what is today referred to as the Lakehouse architecture. In this session, we first introduce Apache Hudi and the key technology gaps it fills in the modern data architecture. Bridging traditional data lakes and warehouses, Hudi helps realize the Lakehouse vision, by bringing transactions, optimized table metadata to data lakes and powerful storage layout optimizations, moving them closer to cloud warehouses of today. Viewed from a data engineering lens, Hudi also plays a key unifying role between the batch and stream processing worlds, by acting as a columnar, server-less ""state store"" for batch jobs, ushering in what we call the incremental processing model, where batch jobs can consume new data, update/delete intermediate results in a Hudi table, instead of re-computing/re-write entire output like old-school big batch jobs. Rest of talk focusses on a deep dive into the some of the time-tested design choices and tradeoffs in Hudi, that helps power some of the largest transactional data lakes on the planet today. We will start by describing a tour of the storage format design, including data, metadata layouts and of course Hudi's timeline, an event log that is central to implementing ACID transactions and concurrency control. We will delve deeper into the practical concurrency control pitfalls in data lakes, and show how Hudi's hybrid approach combining MVCC with optimistic concurrency control, lowers contention and unlocks minute-level near real-time commits to Hudi tables. We will conclude with code examples that showcase Hudi's rich set of table services that perform vital table management such as cleaning older file versions, compaction of delta logs into base files, dynamic re-clustering for faster query performance, or the more recently introduced indexing service that maintains Hudi's multi-modal indexing capabilities.

The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...

Dremio Corporation

Essentially every successful analytical DBMS in the market today makes use of column-oriented data structures. In the Hadoop ecosystem, Apache Parquet (and Apache ORC) provide similar advantages in terms of processing and storage efficiency. Apache Arrow is the in-memory counterpart to these formats and has been been embraced by over a dozen open source projects as the de facto standard for in-memory processing. In this session the PMC Chair for Apache Arrow and the PMC Chair for Apache Parquet discuss the future of column-oriented processing.

The arrival of flash storage introduced a radical change in performance profiles of direct attached devices. At the time, it was obvious that Linux I/O stack needed to be redesigned in order to support devices capable of millions of IOPs, and with extremely low latency. In this talk we revisit the changes the Linux block layer in the last decade or so, that made it what it is today - a performant, scalable, robust and NUMA-aware subsystem. In addition, we cover the new NVMe over Fabrics support in Linux. Sagi Grimberg Sagi is Principal Architect and co-founder at LightBits Labs.

Back to the future with C++ and Seastar

Tzach Livyatan

Measuring P99 Latency in Event-Driven Architectures with OpenTelemetry

ScyllaDB

While there are numerous benefits to Event-Driven Architecture, like improved productivity, flexibility, and scalability, they also pose a few disadvantages, such as the complexity of measuring end-to-end latency and identifying bottlenecks in specific services. This talk shows you how to produce telemetry from your services using an open standard to retain control of data. OpenTelemetry allows you to instrument your application code through vendor-neutral APIs, libraries, and tools. It provides the tools necessary for you to gain visibility into the performance of your services and overall latency. Anton will share his experience building high-throughput services and strategies to use distributed tracing in an optimal way without affecting the overall performance of the services.

Transparent Data Encryption in PostgreSQL and Integration with Key Management...

Masahiko Sawada

How to be Successful with Scylla

ScyllaDB

Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.

Achieving High Availability in PostgreSQL

Mydbops

Redis vs Aerospike

Sayyaparaju Sunil

Everything you always wanted to know about Redis but were afraid to ask

Carlos Abalde

Storing State Forever: Why It Can Be Good For Your Analytics

Yaroslav Tkachenko

State is an essential part of the modern streaming pipelines: it enables a variety of foundational capabilities like windowing, aggregation, enrichment, etc. But usually, the state is either transient, so we only keep it until the window is closed, or it's fairly small and doesn't grow much. But what if we treat the state differently? The keyed state in Flink can be scaled vertically and horizontally, it's reliable and fault-tolerant... so is scaling a stateful Flink application that different from scaling any data store like Kafka or MySQL? At Shopify, we've worked on a massive analytical data pipeline that's needed to support complex streaming joins and correctly handle arbitrarily late-arriving data. We came up with an idea to never clear state and support joins this way. We've made a successful proof of concept, ingested all historical transactional Shopify data and ended up storing more than 10 TB of Flink state. In the end, it allowed us to achieve 100% data correctness.

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

ScyllaDB

Outrageous Performance: RageDB's Experience with the Seastar Framework

ScyllaDB

Learn how RageDB leveraged the Seastar framework to build an outrageously fast graph database. Understand the right way to embrace the triple digit multi-core future by scaling up and not out. Sacrifice everything for speed and get out of the way of your users. No drivers, no custom protocols, no query languages, no GraphQL, just code in and JSON out. Exploit the built in Seastar HTTP server to tie it all together.

Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing

ScyllaDB

Running a virtual machine will obviously add some overhead over running on bare metal. This is expected. But there are some cases that the overhead is much higher than expected. This talk discusses using tracing to analyze this overhead from a Linux host running KVM. Ideally, the guest would also be running Linux to get a more detailed explanation of the events, but analysis can still be done when the guest is something else.

RocksDB detail

MIJIN AN

Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data

Jignesh Shah

Linux Memory ManagementNi Zo-Ma

Kafka internals

David Groozman

Aerospike Architecture

Peter Milne

Stop the Guessing: Performance Methodologies for Production Systems

Brendan Gregg

Talk presented at Velocity 2013. Description: When faced with performance issues on complex production systems and distributed cloud environments, it can be difficult to know where to begin your analysis, or to spend much time on it when it isn’t your day job. This talk covers various methodologies, and anti-methodologies, for systems analysis, which serve as guidance for finding fruitful metrics from your current performance monitoring products. Such methodologies can help check all areas in an efficient manner, and find issues that can be easily overlooked, especially for virtualized environments which impose resource controls. Some of the tools and methodologies covered, including the USE Method, were developed by the speaker and have been used successfully in enterprise and cloud environments.

Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...

Flink Forward

Netflix’s playback data records every user interaction with video on the service, from trailers on the home page to full-length movies. This is a critical dataset with high volume that is used broadly across Netflix, powering product experiences, AB test metrics, and offline insights. In processing playback data, we depend heavily on event-time partitioning to handle a long tail of late arriving events. In this talk, I’ll provide an overview of our recent implementation of generic event-time partitioning on high volume streams using Apache Flink and Apache Iceberg (Incubating). Built as configurable Flink components that leverage Iceberg as a new output table format, we are now able to write playback data and other large scale datasets directly from a stream into a table partitioned on event time, replacing the common pattern of relying on a post-processing batch job that “puts the data in the right place”. We’ll talk through what it took to apply this to our playback data in practice, as well as challenges we hit along the way and tradeoffs with a streaming approach to event-time partitioning.

Modeling Data and Queries for Wide Column NoSQL

ScyllaDB

P99CONF — What We Need to Unlearn About Persistent Storage

ScyllaDB

System software engineers have long been taught that disks are slow and sequential I/O is key to performance. With SSD drives I/O really got much faster but not simpler. In this brave new world of rocket-speed throughputs an engineer has to distinguish sustained workload from bursts, (still) take care about I/O buffer sizes, account for disks’ internal parallelism and study mixed I/O characteristics in advance. In this talk we will share some key performance measurements of the modern hardware we’re taking at ScyllaDB and our opinion about the implications for the database and system software design.

Linux NUMA & Databases: Perils and Opportunities

Raghavendra Prabhu

What's hot

The Linux Block Layer - Built for Fast Storage

Kernel TLV

Back to the future with C++ and Seastar

Tzach Livyatan

Measuring P99 Latency in Event-Driven Architectures with OpenTelemetry

ScyllaDB

Transparent Data Encryption in PostgreSQL and Integration with Key Management...

Masahiko Sawada

How to be Successful with Scylla

ScyllaDB

Achieving High Availability in PostgreSQL

Mydbops

Redis vs Aerospike

Sayyaparaju Sunil

Everything you always wanted to know about Redis but were afraid to ask

Carlos Abalde

Storing State Forever: Why It Can Be Good For Your Analytics

Yaroslav Tkachenko

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

ScyllaDB

Outrageous Performance: RageDB's Experience with the Seastar Framework

ScyllaDB

Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing

ScyllaDB

RocksDB detail

MIJIN AN

Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data

Jignesh Shah

Linux Memory ManagementNi Zo-Ma

Kafka internals

David Groozman

Aerospike Architecture

Peter Milne

Stop the Guessing: Performance Methodologies for Production Systems

Brendan Gregg

Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...

Flink Forward

Modeling Data and Queries for Wide Column NoSQL

ScyllaDB

What's hot (20)

The Linux Block Layer - Built for Fast Storage

Back to the future with C++ and Seastar

Measuring P99 Latency in Event-Driven Architectures with OpenTelemetry

Transparent Data Encryption in PostgreSQL and Integration with Key Management...

How to be Successful with Scylla

Achieving High Availability in PostgreSQL

Redis vs Aerospike

Everything you always wanted to know about Redis but were afraid to ask

Storing State Forever: Why It Can Be Good For Your Analytics

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

Outrageous Performance: RageDB's Experience with the Seastar Framework

Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing

RocksDB detail

Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data

Linux Memory Management

Kafka internals

Aerospike Architecture

Stop the Guessing: Performance Methodologies for Production Systems

Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...

Modeling Data and Queries for Wide Column NoSQL

Similar to Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes

P99CONF — What We Need to Unlearn About Persistent Storage

ScyllaDB

Linux NUMA & Databases: Perils and Opportunities

Raghavendra Prabhu

Caching in

RichardWarburton

Modern computationally intensive tasks are rarely bottlenecked by the absolute performance of your processor cores. The real bottleneck in 2013 is getting data out of memory. CPU caches are designed to alleviate the difference in performance between CPU core clock speed and main memory clock speed, but developers rarely understand how this interaction works or how to measure or tune their application accordingly. This session aims to address this by: • Describing how CPU caches work in the latest Intel hardware • Showing what and how to measure in order to understand the caching behavior of software • Giving examples of how this affects Java program performance and what can be done to address poor cache utilization

Cache in Chromium: Disk Cache

Chang W. Doh

Memorymapping.ppt

JeevanathanRavi

Caching in (DevoxxUK 2013)

RichardWarburton

Modern computationally intensive tasks are rarely bottlenecked on the absolute performance of your processor cores, the real bottleneck in 2012 is getting data out of memory. CPU Caches are designed to alleviate the difference in performance between CPU Core Clockspeed and main memory clockspeed, but developers rarely understand how this interaction works or how to measure or tune their application accordingly. This Talk aims to solve that by: 1. Describing how the CPU caches work in the latest Intel Hardware. 2. Showing people what and how to measure in order to understand the caching behaviour of their software. 3. Giving examples of how this affects Java Program performance and what can be done to address things.

virtula memory.ppt

RAHULsingh156889

Ext4 write barrier

Somdutta Roy

Mongodb meetup

Eytan Daniyalzade

Kafka on ZFS: Better Living Through Filesystems

confluent

(Hugh O'Brien, Jet.com) Kafka Summit SF 2018 You’re doing disk IO wrong, let ZFS show you the way. ZFS on Linux is now stable. Say goodbye to JBOD, to directories in your reassignment plans, to unevenly used disks. Instead, have 8K Cloud IOPS for $25, SSD speed reads on spinning disks, in-kernel LZ4 compression and the smartest page cache on the planet. (Fear compactions no more!) Learn how Jet’s Kafka clusters squeeze every drop of disk performance out of Azure, all completely transparent to Kafka. -Striping cheap disks to maximize instance IOPS -Block compression to reduce disk usage by ~80% (JSON data) -Instance SSD as the secondary read cache (storing compressed data), eliminating >99% of disk reads and safe across host redeployments -Upcoming features: Compressed blocks in memory, potentially quadrupling your page cache (RAM) for free We’ll cover: -Basic Principles -Adapting ZFS for cloud instances (gotchas) -Performance tuning for Kafka -Benchmarks

Threads - Why Can't You Just Play Nicely With Your Memory_Robert Burrell Donkin

SSD для вашей базы данных, Петр Зайцев (Percona)

Ontico

TokuDB internals / Лесин Владислав (Percona)

Ontico

TokuDB — это реализация так называемых фрактальных деревьев для MySQL. Фрактальные деревья — это те же самые B+-деревья, но с буфером сообщений в каждой ноде. Сообщения описывают изменения данных. За счет того, что все изменения накапливаются в буферах сообщений и спускаются по дереву только по мере необходимости, при росте фрактальное дерево деградирует по скорости гораздо меньше, чем B+-дерево. Профессионалам в области разработки высоконагруженных систем хорошо известно устройство движка InnoDB для MySQL. Знание внутренних механизмов движка помогает понять, как правильно его настроить, диагностировать проседания в производительности. Целью этого доклада является рассказать об устройстве TokuDB. В рамках доклада будут рассмотрена реализация таких подсистем как caching, logging and recovery, checkpoints, transactions, MVCC, compression, locking.

$Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$ $Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$

Distro Recipes 2013 : My ${favorite_linux_distro} is slow!

Anne Nicolas

The life and times

Abeer Naskar

CSL Seminar presented by Cassiano Campes - 16-11-14

Cassiano Campes

Learn about log structured file system

Gang He

Operating Systems

Geetha Kannan

Case study of BtrFS: A fault tolerant File system

Kumar Amit Mehta

Flash USBJAVED MALIK

Similar to Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes (20)

P99CONF — What We Need to Unlearn About Persistent Storage

Linux NUMA & Databases: Perils and Opportunities

Caching in

Cache in Chromium: Disk Cache

Memorymapping.ppt

Caching in (DevoxxUK 2013)

virtula memory.ppt

Ext4 write barrier

Mongodb meetup

Kafka on ZFS: Better Living Through Filesystems

Threads - Why Can't You Just Play Nicely With Your Memory_

SSD для вашей базы данных, Петр Зайцев (Percona)

TokuDB internals / Лесин Владислав (Percona)

$Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$ $Distro Recipes 2013 : My ${favorite_linux_distro} is slow!$

Distro Recipes 2013 : My ${favorite_linux_distro} is slow!

The life and times

CSL Seminar presented by Cassiano Campes - 16-11-14

Learn about log structured file system

Operating Systems

Case study of BtrFS: A fault tolerant File system

Flash USB

More from ScyllaDB

Optimizing NoSQL Performance Through Observability

ScyllaDB

ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. But before you squeeze, make sure you know what to monitor! Watch our experienced Postgres developer work through monitoring and performance strategies that help him understand what mistakes he’s made moving to NoSQL. And learn with him as our database performance expert offers friendly guidance on how to use monitoring and performance tuning to get his sample Rust application on the right track. This webinar focuses on using monitoring and performance tuning to discover and correct mistakes that commonly occur when developers move from SQL to NoSQL. For example: - Common issues getting up and running with the monitoring stack - Using the CQL optimizations dashboard - Common issues causing high latency in a node - Common issues causing replica imbalance - What a healthy system looks like in terms of memory - Key metrics to keep an eye on This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Event-Driven Architecture Masterclass: Challenges in Stream Processing

ScyllaDB

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

ScyllaDB

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example: - Understanding query first design principles - Planning for schema evolution - Steering clear of common pitfalls and anti-patterns - Assessing data access patterns This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

What Developers Need to Unlearn for High Performance NoSQL

ScyllaDB

See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the transition ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning. Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts. Our first webinar of this series will cover common mistakes with practices such as: - Translating the data model to NoSQL - Optimizing table design - Optimizing query performance - Planning for partitioning This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.

Low Latency at Extreme Scale: Proven Practices & Pitfalls

ScyllaDB

Expert tips on how to maximize your database performance at scale Untangle the complexity of achieving database performance at scale. Join this webinar to discover commonly overlooked ways to get predictable low latency, even at extreme scale. Our Solution Architects will walk you through the strategies and pitfalls learned by working on thousands of real-world distributed database projects, many reaching 1M OPS with single-digit MS latencies. In addition to offering clear recommendations, we’ll also explain the process behind how we arrived at them – so you can benefit from the lessons learned by other teams. We’ll cover how to: - Design and deploy a large-scale distributed database cluster - Optimize your clients’ interactions with it - Expand the cluster horizontally and globally - Ensure it survives whatever disasters the world throws at it

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect four specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma, we'll: - Examine the context and technical requirements - Talk about potential solutions and cover the pros and cons of each - Disclose what approach the team took, and how it worked out About the speaker: Felipe is an IT specialist with years of experience on distributed systems and open-source technologies. He is one of the co-authors of "Database Performance at Scale", an Open Access, freely available publication for individuals interested on improving database performance. At ScyllaDB, he works as a Solution Architect.

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

ScyllaDB

Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care? This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on: - The hidden aspects of linear scaling - When linear scaling matters most and when it’s simply irrelevant - Often overlooked considerations for optimizing and measuring distributed systems performance Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.

Dissecting Real-World Database Performance Dilemmas

ScyllaDB

Navigating Complex Database Performance Hurdles Tackling your own database performance challenges is serious business. For a change of pace, let’s have some fun learning from other teams’ performance predicaments. Join us for an interactive session where we dissect 4 specific database performance challenges faced by teams considering or using ScyllaDB. For each dilemma: - The presenters will describe the context and technical requirements - Together, we’ll talk about potential solutions and cover the pros and cons of each - Finally, we’ll disclose what approach the team took, and how it worked out Throughout the event, we’ll have opportunities to win ScyllaDB swag and prizes! Come prepared to engage in lively discussions and gain valuable insight into database performance strategies.

Replacing Your Cache with ScyllaDB

ScyllaDB

Technical risks of putting a cache in front of your database– and what to do instead Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. External caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at how ScyllaDB’s cache implementation simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching ia a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of ScyllaDB's specialized row-based cache - Real-world examples of why and how teams eliminated their external cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

ScyllaDB

Discover how your team can achieve low latency at the extreme scale that your data-intensive applications require. We’ll walk you through an example of how ScyllaDB scales linearly to achieve 1M and then 2M OPS – with <1ms P99 latency. We’ll cover how this works on a sample realtime app (an ML feature store), share best practices for performance, and talk about the most important tradeoffs you’ll need to negotiate. Join us to learn: - Why and how to ensure your database takes full advantage of your cloud infrastructure - What architectural considerations matter most for high throughput and low latency - Key factors to consider when selecting a high-performance database

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

ScyllaDB

Teams experiencing subpar latency commonly turn to an external cache to meet the required SLAs. Placing a cache in front of your database might seem like a fast and easy fix, but it often ends up introducing unanticipated complexity, costs, and risks. Caches can be one of the more problematic components of distributed application architecture. Join this webinar for a technical discussion of the risks associated with using an external cache and a look at an alternative strategy that simplifies your architecture without compromising latency. We’ll cover: - Different approaches to caching (pre-caching vs. caching, side cache vs. transparent cache) - 7 specific reasons why external caching can be a bad choice - Why Linux’s default caching doesn’t work well for databases - The advantages & architecture of specialized row-based caches - Real-world examples of why and how teams eliminated their external cache

Getting the most out of ScyllaDB

ScyllaDB

Expert tips on how to maximize your database potential If you’re considering or getting started with ScyllaDB, you’re probably intrigued by its potential to achieve high throughput and predictable low latency at a reasonable cost. So how do you ensure that you’re maximizing that potential for your team’s specific workloads and use case? This webinar offers practical advice for navigating the various decision points you’ll face as you assess whether ScyllaDB is a good fit for your team and later roll it out into production. We’ll cover the most critical considerations, tradeoffs, and recommendations related to: - Infrastructure selection - ScyllaDB configuration - Client-side setup - Data modeling

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

ScyllaDB

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

ScyllaDB

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB

ScyllaDB Virtual Workshop

ScyllaDB

Build the foundation for success with ScyllaDB Ready to try out ScyllaDB and want to make sure you’re “doing it right?” We’ll help you get up and running, fast. Spend an hour with our architects for a crash course in what ScyllaDB is all about, the core concepts you need to know, and a step-by-step demonstration of how to get started. During the live, interactive one-hour session, you will learn: - Critical considerations for designing a NoSQL system and NoSQL data model - The technology underlying ScyllaDB’s high performance, availability, and scalability – and best practices for taking advantage of it - How to install, deploy and operate a full working ScyllaDB system, including multi-data center deployment, monitoring, and connecting an application to the ScyllaDB cluster By the end of the session, you’ll have the knowledge and tools you need to get ScyllaDB running on your laptop, connect your application to it, and see what it’s like to use ScyllaDB for your specific use case.

DBaaS in the Real World: Risks, Rewards & Tradeoffs

ScyllaDB

What do you give up – and gain – when moving to a fully-managed cloud database? Now that database-as-a-Service (DBaaS) offerings have been “battle tested” in production, how is the reality matching up to the expectation? What can teams thinking of adopting a fully-managed DBaaS can learn from teams who have years of experience working with this deployment model? Join this webinar to dive into the reality of working with various high-performance DBaaS offerings. We’ll cover the following topics, all supported with real-world examples: - Developer flexibility - Cost variability - Security & privacy - Performance impact - Transparency & troubleshooting

Build Low-Latency Applications in Rust on ScyllaDB

ScyllaDB

Hands-on workshop to explore the affinities between Rust, the Tokio framework, and ScyllaDB NoSQL. ScyllaDB is a perfect match for Rust. Similar to the Rust programming language and the Tokio framework, ScyllaDB is built on an asynchronous, non-blocking runtime that works extremely well for building highly-reliable low-latency distributed applications. In this workshop, we’ll build a sample Rust application on our high performance native Rust client driver. By compiling and walking through the code, you’ll learn how to craft queries to a locally running ScyllaDB cluster. We’ll cover how to: - Install and compile a sample app, built on ScyllaDB’s native Rust SDK. - Get a ScyllaDB cluster up and running - Connect the application to the database - Review data modeling, query types, and best practices - Manage and monitor the database for consistently low latencies If you’re an application developer with an interest in Rust and Tokio, this workshop is for you!

More from ScyllaDB (20)

Optimizing NoSQL Performance Through Observability

Event-Driven Architecture Masterclass: Challenges in Stream Processing

Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...

Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...

Developer Data Modeling Mistakes: From Postgres to NoSQL

What Developers Need to Unlearn for High Performance NoSQL

Low Latency at Extreme Scale: Proven Practices & Pitfalls

Dissecting Real-World Database Performance Dilemmas

Beyond Linear Scaling: A New Path for Performance with ScyllaDB

Dissecting Real-World Database Performance Dilemmas

Replacing Your Cache with ScyllaDB

Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability

7 Reasons Not to Put an External Cache in Front of Your Database.pptx

Getting the most out of ScyllaDB

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

NoSQL Database Migration Masterclass - Session 3: Migration Logistics

NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges

ScyllaDB Virtual Workshop

DBaaS in the Real World: Risks, Rewards & Tradeoffs

Build Low-Latency Applications in Rust on ScyllaDB

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Neuro-symbolic is not enough, we need neuro-*semantic*

Frank van Harmelen

Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”. All of this illustrated with link prediction over knowledge graphs, but the argument is general.

The Future of Platform Engineering

Jemma Hussein Allen

Bits & Pixels using AI for Good.........

Alison B. Lowndes

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

The Art of the Pitch: WordPress Relationships and Sales

Laura Byrne

Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes? All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Neuro-symbolic is not enough, we need neuro-*semantic*

The Future of Platform Engineering

Bits & Pixels using AI for Good.........

How world-class product teams are winning in the AI era by CEO and Founder, P...

The Art of the Pitch: WordPress Relationships and Sales

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

UiPath Test Automation using UiPath Test Suite series, part 4

PCI PIN Basics Webinar from the Controlcase Team

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

JMeter webinar - integration with InfluxDB and Grafana

FIDO Alliance Osaka Seminar: Overview.pdf

Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes

1. Database Internals Pavel Emelyanov & Botond Dénes

2. Hardware (and kernel) matters ■ CPU ■ Memory ■ Disk I/O ■ Networking 2

3. CPU 3

4. Share nothing 4 ■ Cross-cores locking costs ○ Cache lines ○ CPU ticks ■ Amdahl’s law

5. Processes/Threads → Fibers 5 ■ Context switch time matters ■ Enforced preemption switch also matters ■ Locking synchronization is a must

6. Quirks 6 ■ Execution stages ■ Branch (mis)prediction

7. Memory 7

8. Allocation 8 ■ Fragmentation ■ Slab-like allocation ■ Log-structured allocation

9. Cache control 9 ■ Let kernel or application do it? ■ Caching can exist at different levels ○ Raw IO blocks ○ Decoded objects

10. Disk I/O 10

11. Types of I/O 11 ■ Buffered reads/writes ■ Memory mapped IO ■ Direct IO ■ Asynchronous direct IO ○ IOUring

12. FS vs Disk 12 ■ Filesystem adds a level of manageability ■ Extra IO operations ○ Blocks allocation ○ Inodes metadata ○ Journal

13. How moderns SSDs work 13 ■ Random reads – YES ■ Random writes – NO ○ Blocks vs Pages ○ Internal parallelism ○ FTL and background operations ■ Sustained vs Burst throughput ■ Mixed workload handling

14. Networking 14

15. Tribute to ANK 15 ■ Linux kernel TCP/IP had been kicking arses since day 0 ■ Coupled with epoll/AIO ■ Can be tuned for both ○ RPC messages ping pong ○ Data streaming ■ System calls switches still matters

16. DPDK 16 ■ Removes extra switches ■ Works better in poll mode

17. IRQ binding 17 ■ NIC IRQ processing times ■ NIC Soft-IRQ processing times ■ Binding IRQs (and Soft-IRQs) to speciﬁc cores

18. ScyllaDB Internals 18

19. Predictably Low Latencies 19

20. CPU 20 ■ Thread-per core architecture ■ User-space scheduler

21. Memory 21 ■ Row-cache ■ Page-cache ■ Full control over allocations

22. Disk I/O 22 ■ Direct I/O ■ User-space scheduler

Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes

Similar to Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Database Performance at Scale Masterclass: Database Internals by Pavel Emelyanov and Botond Denes