Scylla 5.0 introduces several new features to improve node operations and compaction:
1. Repair-based node operations (RBNO) provide more efficient, consistent, and simplified bootstrap, replace, rebuild, and other node operations by using row-level repair as the underlying mechanism instead of streaming.
2. Off-strategy compaction keeps sstables generated during node operations in a separate data set and compacts them together after the operation finishes for less compaction work and faster completion.
3. Space amplification goal (SAG) for compaction optimizes space efficiency for overwrite workloads by dynamically adapting compaction to meet latency and space goals, improving storage density.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
Scylla Summit 2022: IO Scheduling & NVMe Disk ModellingScyllaDB
This document discusses IO scheduling and modeling NVMe disks. It explains that different components compete for limited disk resources with different priorities and may overconsume resources if not scheduled properly. It then describes using an IO scheduler to get maximum concurrency from the disk and apply request priorities while avoiding overconsumption. The document outlines a token bucket algorithm for rate limiting and previews ongoing work to implement a new scheduler in Seastar and Scylla and add related metrics and tuning capabilities.
MariaDB 10.0 introduces domain-based parallel replication which allows transactions in different domains to execute concurrently on replicas. This can result in out-of-order transaction commit. MariaDB 10.1 adds optimistic parallel replication which maintains commit order. The document discusses various parallel replication techniques in MySQL and MariaDB including schema-based replication in MySQL 5.6 and logical clock replication in MySQL 5.7. It provides performance benchmarks of these techniques from Booking.com's database environments.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
io_uring provides a new asynchronous I/O interface in Linux that aims to address limitations with existing interfaces like aio and libaio. It uses a ring-based model for submission and completion queues to efficiently support asynchronous I/O operations with low latency and high throughput. Though initially skeptical, Linus Torvalds ultimately merged io_uring into the Linux kernel due to improvements in missing features, ease of use, and efficiency over alternatives.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
Scylla Summit 2022: IO Scheduling & NVMe Disk ModellingScyllaDB
This document discusses IO scheduling and modeling NVMe disks. It explains that different components compete for limited disk resources with different priorities and may overconsume resources if not scheduled properly. It then describes using an IO scheduler to get maximum concurrency from the disk and apply request priorities while avoiding overconsumption. The document outlines a token bucket algorithm for rate limiting and previews ongoing work to implement a new scheduler in Seastar and Scylla and add related metrics and tuning capabilities.
MariaDB 10.0 introduces domain-based parallel replication which allows transactions in different domains to execute concurrently on replicas. This can result in out-of-order transaction commit. MariaDB 10.1 adds optimistic parallel replication which maintains commit order. The document discusses various parallel replication techniques in MySQL and MariaDB including schema-based replication in MySQL 5.6 and logical clock replication in MySQL 5.7. It provides performance benchmarks of these techniques from Booking.com's database environments.
Scylla Summit 2022: Scylla 5.0 New Features, Part 1ScyllaDB
Discover the new features and capabilities of Scylla Open Source 5.0 directly from the engineers who developed it. This second block of lightning talks will cover the following topics:
- New IO Scheduler and Disk Parallelism
- Per-Service-Level Timeouts
- Better Workload Estimation for Backpressure and Out-of-Memory Conditions
- Large Partition Handling Improvements
- Optimizing Reverse Queries
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
io_uring provides a new asynchronous I/O interface in Linux that aims to address limitations with existing interfaces like aio and libaio. It uses a ring-based model for submission and completion queues to efficiently support asynchronous I/O operations with low latency and high throughput. Though initially skeptical, Linus Torvalds ultimately merged io_uring into the Linux kernel due to improvements in missing features, ease of use, and efficiency over alternatives.
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
This presentation describes the reasons why Facebook decided to build yet another key-value store, the vision and architecture of RocksDB and how it differs from other open source key-value stores. Dhruba describes some of the salient features in RocksDB that are needed for supporting embedded-storage deployments. He explains typical workloads that could be the primary use-cases for RocksDB. He also lays out the roadmap to make RocksDB the key-value store of choice for highly-multi-core processors and RAM-speed storage devices.
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
MySQL Server Backup, Restoration, And Disaster Recovery Planning PresentationColin Charles
This document discusses MySQL server backup, restoration, and disaster recovery planning. It covers when backups are needed, what to back up, best practices for performing backups, storage locations, and backup tools and methods. Key points include backing up databases, logs, and configurations regularly using tools like mysqldump, the binary log, and file copying with FLUSH TABLES WITH READ LOCK. Restoring requires both backups and binary logs to recover to a point-in-time.
The document discusses atomic DDL operations in MySQL 8.0. It describes the requirements for a transactional data dictionary storage engine and storage engines that support atomic DDL. It provides examples of how DDL statements like CREATE TABLE, DROP TABLE, and DROP SCHEMA are implemented atomically in MySQL 8.0 using a single transaction, compared to previous versions where these operations were not fully atomic. This ensures consistency after DDL operations and prevents issues like orphan files or tables.
RocksDB is an embedded key-value store written in C++ and optimized for fast storage environments like flash or RAM. It uses a log-structured merge tree to store data by writing new data sequentially to an in-memory log and memtable, periodically flushing the memtable to disk in sorted SSTables. It reads from the memtable and SSTables, and performs background compaction to merge SSTables and remove overwritten data. RocksDB supports two compaction styles - level style, which stores SSTables in multiple levels sorted by age, and universal style, which stores all SSTables in level 0 sorted by time.
The document provides an overview of the InnoDB storage engine used in MySQL. It discusses InnoDB's architecture including the buffer pool, log files, and indexing structure using B-trees. The buffer pool acts as an in-memory cache for table data and indexes. Log files are used to support ACID transactions and enable crash recovery. InnoDB uses B-trees to store both data and indexes, with rows of variable length stored within pages.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Starting up Containers Super Fast With Lazy Pulling of ImagesKohei Tokunaga
Talked at Container Plumbing Days about speeding up container startup by lazy pulling images on Kubernetes, containerd, BuildKit, Podman and CRI-O with eStargz and zstd:chunked.
eStargz and Stargz Snapshotter: https://github.com/containerd/stargz-snapshotter
zstd:chunked proposal: https://github.com/containers/storage/pull/775
Patch set to enable lazy pulling on Podman and CRI-O (a.k.a. Additional Layer Store): https://github.com/containers/storage/pull/795
https://github.com/containerd/stargz-snapshotter/pull/281
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
This document provides an overview of using eBPF (extended Berkeley Packet Filter) to quickly get performance wins as a sysadmin. It recommends installing BCC and bpftrace tools to easily find issues like periodic processes, misconfigurations, unexpected TCP sessions, or slow file system I/O. A case study examines using biosnoop to identify which processes were causing disk latency issues. The document suggests thinking like a sysadmin first by running tools, then like a programmer if a problem requires new tools. It also outlines recommended frontends depending on use cases and provides references to learn more about BPF.
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
1. ClickHouse uses a MergeTree storage engine that stores data in compressed columnar format and partitions data into parts for efficient querying.
2. Query performance can be optimized by increasing threads, reducing data reads through filtering, restructuring queries, and changing the data layout such as partitioning strategy and primary key ordering.
3. Significant performance gains are possible by optimizing the data layout, such as keeping an optimal number of partitions, using encodings to reduce data size, and skip indexes to avoid unnecessary I/O. Proper indexes and encodings can greatly accelerate queries.
Play with FILE Structure - Yet Another Binary Exploit TechniqueAngel Boy
The document discusses exploiting the FILE structure in C programs. It provides an overview of how file streams and the FILE structure work. Key points include that the FILE structure contains flags, buffers, a file descriptor, and a virtual function table. It describes how functions like fopen, fread, and fwrite interact with the FILE structure. It then discusses potential exploitation techniques like overwriting the virtual function table or FILE's linked list to gain control of program flow. It notes defenses like vtable verification implemented in modern libc libraries.
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
This document provides an overview of ClickHouse, an open source column-oriented database management system. It discusses ClickHouse's ability to handle high volumes of event data in real-time, its use of the MergeTree storage engine to sort and merge data efficiently, and how it scales through sharding and distributed tables. The document also covers replication using the ReplicatedMergeTree engine to provide high availability and fault tolerance.
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and Scylla):
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Implement its per-core shared-nothing autosharding mechanism
Utilize modern storage hardware
Leverage NUMA to get the best RAM performance
Balance your data across CPUs and nodes for best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
GlusterFS is scale-out software defined storage. It was presented at LISA15 in Washington D.C. from November 8-13, 2015. The presentation covered GlusterFS installation, configuration of trusted storage pools, creating and managing distributed, replicated, and other volume types, expanding and shrinking volumes, self-healing, and accessing data using native GlusterFS clients, NFS, and SMB/CIFS. Configuration details for CTDB and sharing volumes over SMB were also provided.
RedisConf17- Using Redis at scale @ TwitterRedis Labs
The document discusses Nighthawk, Twitter's distributed caching system which uses Redis. It provides caching services at a massive scale of over 10 million queries per second and 10 terabytes of data across 3000 Redis nodes. The key aspects of Nighthawk's architecture that allow it to scale are its use of a client-oblivious proxy layer and cluster manager that can independently scale and rebalance partitions across Redis nodes. It also employs replication between data centers to provide high availability even in the event of node failures. Some challenges discussed are handling "hot keys" that get an unusually high volume of requests and more efficiently warming up replicas when nodes fail.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
This is a summary of the sessions I attended at PASS Summit 2017. Out of the week-long conference, I put together these slides to summarize the conference and present at my company. The slides are about my favorite sessions that I found had the most value. The slides included screenshotted demos I personally developed and tested alike the speakers at the conference.
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
ScyllaDB adopted Raft as a consensus protocol in order to dramatically improve our operational aspects as well as provide strong consistency to the end-user. This talk will explain how Raft behaves in Scylla Open Source 5.0 and introduce the first end-user visible major improvement: schema changes. Learn how cluster configuration resides in Raft, providing consistent cluster assembly and configuration management. This makes bootstrapping safer and provides reliable disaster recovery when you lose the majority of the cluster.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Introduction to memcached, a caching service designed for optimizing performance and scaling in the web stack, seen from perspective of MySQL/PHP users. Given for 2nd year students of professional bachelor in ICT at Kaho St. Lieven, Gent.
Delivered as plenary at USENIX LISA 2013. video here: https://www.youtube.com/watch?v=nZfNehCzGdw and https://www.usenix.org/conference/lisa13/technical-sessions/plenary/gregg . "How did we ever analyze performance before Flame Graphs?" This new visualization invented by Brendan can help you quickly understand application and kernel performance, especially CPU usage, where stacks (call graphs) can be sampled and then visualized as an interactive flame graph. Flame Graphs are now used for a growing variety of targets: for applications and kernels on Linux, SmartOS, Mac OS X, and Windows; for languages including C, C++, node.js, ruby, and Lua; and in WebKit Web Inspector. This talk will explain them and provide use cases and new visualizations for other event types, including I/O, memory usage, and latency.
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
In this talk, we will discuss Happn's war story about migrating a Cassandra 2.1 cluster containing more than 68 Billion records in a counter table to ScyllaDB Open Source.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
MySQL Server Backup, Restoration, And Disaster Recovery Planning PresentationColin Charles
This document discusses MySQL server backup, restoration, and disaster recovery planning. It covers when backups are needed, what to back up, best practices for performing backups, storage locations, and backup tools and methods. Key points include backing up databases, logs, and configurations regularly using tools like mysqldump, the binary log, and file copying with FLUSH TABLES WITH READ LOCK. Restoring requires both backups and binary logs to recover to a point-in-time.
The document discusses atomic DDL operations in MySQL 8.0. It describes the requirements for a transactional data dictionary storage engine and storage engines that support atomic DDL. It provides examples of how DDL statements like CREATE TABLE, DROP TABLE, and DROP SCHEMA are implemented atomically in MySQL 8.0 using a single transaction, compared to previous versions where these operations were not fully atomic. This ensures consistency after DDL operations and prevents issues like orphan files or tables.
RocksDB is an embedded key-value store written in C++ and optimized for fast storage environments like flash or RAM. It uses a log-structured merge tree to store data by writing new data sequentially to an in-memory log and memtable, periodically flushing the memtable to disk in sorted SSTables. It reads from the memtable and SSTables, and performs background compaction to merge SSTables and remove overwritten data. RocksDB supports two compaction styles - level style, which stores SSTables in multiple levels sorted by age, and universal style, which stores all SSTables in level 0 sorted by time.
The document provides an overview of the InnoDB storage engine used in MySQL. It discusses InnoDB's architecture including the buffer pool, log files, and indexing structure using B-trees. The buffer pool acts as an in-memory cache for table data and indexes. Log files are used to support ACID transactions and enable crash recovery. InnoDB uses B-trees to store both data and indexes, with rows of variable length stored within pages.
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
Talk for USENIX/LISA2014 by Brendan Gregg, Netflix. At Netflix performance is crucial, and we use many high to low level tools to analyze our stack in different ways. In this talk, I will introduce new system observability tools we are using at Netflix, which I've ported from my DTraceToolkit, and are intended for our Linux 3.2 cloud instances. These show that Linux can do more than you may think, by using creative hacks and workarounds with existing kernel features (ftrace, perf_events). While these are solving issues on current versions of Linux, I'll also briefly summarize the future in this space: eBPF, ktap, SystemTap, sysdig, etc.
Starting up Containers Super Fast With Lazy Pulling of ImagesKohei Tokunaga
Talked at Container Plumbing Days about speeding up container startup by lazy pulling images on Kubernetes, containerd, BuildKit, Podman and CRI-O with eStargz and zstd:chunked.
eStargz and Stargz Snapshotter: https://github.com/containerd/stargz-snapshotter
zstd:chunked proposal: https://github.com/containers/storage/pull/775
Patch set to enable lazy pulling on Podman and CRI-O (a.k.a. Additional Layer Store): https://github.com/containers/storage/pull/795
https://github.com/containerd/stargz-snapshotter/pull/281
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
This document provides an overview of using eBPF (extended Berkeley Packet Filter) to quickly get performance wins as a sysadmin. It recommends installing BCC and bpftrace tools to easily find issues like periodic processes, misconfigurations, unexpected TCP sessions, or slow file system I/O. A case study examines using biosnoop to identify which processes were causing disk latency issues. The document suggests thinking like a sysadmin first by running tools, then like a programmer if a problem requires new tools. It also outlines recommended frontends depending on use cases and provides references to learn more about BPF.
ClickHouse Query Performance Tips and Tricks, by Robert Hodges, Altinity CEOAltinity Ltd
1. ClickHouse uses a MergeTree storage engine that stores data in compressed columnar format and partitions data into parts for efficient querying.
2. Query performance can be optimized by increasing threads, reducing data reads through filtering, restructuring queries, and changing the data layout such as partitioning strategy and primary key ordering.
3. Significant performance gains are possible by optimizing the data layout, such as keeping an optimal number of partitions, using encodings to reduce data size, and skip indexes to avoid unnecessary I/O. Proper indexes and encodings can greatly accelerate queries.
Play with FILE Structure - Yet Another Binary Exploit TechniqueAngel Boy
The document discusses exploiting the FILE structure in C programs. It provides an overview of how file streams and the FILE structure work. Key points include that the FILE structure contains flags, buffers, a file descriptor, and a virtual function table. It describes how functions like fopen, fread, and fwrite interact with the FILE structure. It then discusses potential exploitation techniques like overwriting the virtual function table or FILE's linked list to gain control of program flow. It notes defenses like vtable verification implemented in modern libc libraries.
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
This document provides an overview of ClickHouse, an open source column-oriented database management system. It discusses ClickHouse's ability to handle high volumes of event data in real-time, its use of the MergeTree storage engine to sort and merge data efficiently, and how it scales through sharding and distributed tables. The document also covers replication using the ReplicatedMergeTree engine to provide high availability and fault tolerance.
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
Most databases are based on architectures that pre-date advances to modern hardware. This results in performance issues, the need to overprovision, and a high total cost of ownership. In this webinar we will discuss the advances to modern server technology and take a deep dive into Scylla’s shard-per-core architecture and our asynchronous engine, the Seastar framework.
Join us to learn how Seastar (and Scylla):
Avoid locks and contention on the CPU level
Bypass kernel bottlenecks
Implement its per-core shared-nothing autosharding mechanism
Utilize modern storage hardware
Leverage NUMA to get the best RAM performance
Balance your data across CPUs and nodes for best and smoothest performance
Plus we’ll cover the advantages of unlocking vertical scalability.
GlusterFS is scale-out software defined storage. It was presented at LISA15 in Washington D.C. from November 8-13, 2015. The presentation covered GlusterFS installation, configuration of trusted storage pools, creating and managing distributed, replicated, and other volume types, expanding and shrinking volumes, self-healing, and accessing data using native GlusterFS clients, NFS, and SMB/CIFS. Configuration details for CTDB and sharing volumes over SMB were also provided.
RedisConf17- Using Redis at scale @ TwitterRedis Labs
The document discusses Nighthawk, Twitter's distributed caching system which uses Redis. It provides caching services at a massive scale of over 10 million queries per second and 10 terabytes of data across 3000 Redis nodes. The key aspects of Nighthawk's architecture that allow it to scale are its use of a client-oblivious proxy layer and cluster manager that can independently scale and rebalance partitions across Redis nodes. It also employs replication between data centers to provide high availability even in the event of node failures. Some challenges discussed are handling "hot keys" that get an unusually high volume of requests and more efficiently warming up replicas when nodes fail.
Instaclustr has a diverse customer base including Ad Tech, IoT and messaging applications ranging from small start ups to large enterprises. In this presentation we share our experiences, common issues, diagnosis methods, and some tips and tricks for managing your Cassandra cluster.
About the Speaker
Brooke Jensen VP Technical Operations & Customer Services, Instaclustr
Instaclustr is the only provider of fully managed Cassandra as a Service in the world. Brooke Jensen manages our team of Engineers that maintain the operational performance of our diverse fleet clusters, as well as providing 24/7 advice and support to our customers. Brooke has over 10 years' experience as a Software Engineer, specializing in performance optimization of large systems and has extensive experience managing and resolving major system incidents.
This is a summary of the sessions I attended at PASS Summit 2017. Out of the week-long conference, I put together these slides to summarize the conference and present at my company. The slides are about my favorite sessions that I found had the most value. The slides included screenshotted demos I personally developed and tested alike the speakers at the conference.
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
This document discusses best practices for managing Cassandra clusters based on Instaclustr's experience managing over 500 nodes and 3 million node-hours. It covers choosing the right Cassandra version, hardware configuration, cost estimation, load testing, data modeling practices, common issues like modeling errors and overload, and important monitoring techniques like logs, metrics, cfstats and histograms. Maintaining a well-designed cluster and proactively monitoring performance are keys to avoiding issues with Cassandra.
The road to enterprise ready open stack storage as serviceSean Cohen
The document summarizes updates and improvements to OpenStack storage services. Key points include:
- Efforts to improve high availability of Cinder APIs and services, including multipath support and active/active deployments.
- Enhancements for volume management such as attaching single volumes to multiple hosts, improved volume migration, and retype functionality.
- Updates to backup services including incremental backups, support for additional targets like NFS/POSIX, and improved integration with Swift.
- Disaster recovery features like consistency group enhancements and planned import/export of snapshots between Cinder installations.
- Work on deployment and rolling upgrades, including database cleanup utilities, object-based communication, and standardized driver
In file systems, large sequential writes are more beneficial than small random writes, and hence many storage systems implement a log structured file system. In the same way, the cloud favors large objects more than small objects. Cloud providers place throttling limits on PUTs and GETs, and so it takes significantly longer time to upload a bunch of small objects than a large object of the aggregate size. Moreover, there are per-PUT calls associated with uploading smaller objects.
In Netflix, a lot of media assets and their relevant metadata is generated and pushed to cloud.
We would like to propose a strategy to compact these small objects into larger blobs before uploading them to Cloud. We will discuss how to select relevant smaller objects, and manage the indexing of these objects within the blob along with modification in reads, overwrites and deletes.
Finally, we would showcase the potential impact of such a strategy on Netflix assets in terms of cost and performance.
1) Ben Bromhead is the CTO of Instaclustr, which provides Cassandra-as-a-Service. When adding capacity to an existing Cassandra cluster, joining nodes normally bootstrap by streaming data from existing nodes, adding load.
2) "Bootstrap from backups" is proposed as a solution where joining nodes stream data directly from backups stored in object storage rather than existing cluster nodes, reducing load on the cluster.
3) This allows more reactive scaling with fewer side effects than typical predictive capacity planning approaches, and makes clusters more cost effective to run. The technique is currently in beta testing.
Cinder enhancements are proposed to better support replication and other long-running volume operations using stateless snapshots. The enhancements include allowing volume drivers to report capabilities like stateless snapshots, tracking task status separately from volume status, and replicating snapshots between backends. This would enable optimizations like transferring snapshot data directly between storage controllers instead of through Cinder.
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit
In this talk we introduce a new Shuffle Handler for Tez, a YARN Auxiliary Service, that addresses the shortcomings and performance bottlenecks of the legacy MapReduce Shuffle Handler, the default shuffle service in Apache Tez. Based on our experiences of running Apache Pig and *Hive at scale on Apache Tez at Yahoo!, advanced features like auto-parallelism and session mode expose specific limitations in the shuffle service which was not designed with these features in mind.
A highly auto-reduced job suffers from longer fetch times as the number of fetches per downstream task increases by the auto-reduction factor. The Apache Tez Shuffle Handler adds composite fetch which has support for multi-partition fetch to mitigate this performance slow down.
Also, since Apache Tez DAGs are run completely within a single application unlike their equivalent MapReduce jobs, intermediate shuffle data in Tez can linger beyond its usefulness. The Apache Tez Shuffle Handler provides deletion APIs to reduce disk usage for such long running Tez sessions.
As an emerging technology we will outline future roadmap for the Apache Tez Shuffle Handler and provide performance evaluation results from real world jobs at scale.
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Continuent
Amazon Web Services (AWS) are gaining popularity, and for good reasons. The Amazon Relational Database Service (AWS RDS) is getting a lot of attention, also for very good reasons. It is quite a compelling idea to have on-demand data services that do not require hiring DBA staff. The expectation is set that everything works like magic and will satisfy all of your enterprise database availability needs.
If you want to build high-volume, business-critical applications, possibly with geographically-distributed audiences, you really want to think twice about using RDS. Continuent customers have a large number deployments in AWS running MySQL on AWS EC2 instances and they choose to rely upon Tungsten Clustering to provide high availability (HA) and disaster recovery (DR). We also support multi-site/multi-master operations and offer true zero-downtime MySQL operations.
AGENDA
- How does RDS handle failover? (Hint: Not very quickly)
- How does RDS handle read scaling? (Hint: Not very well)
- Can you do zero-downtime maintenance with RDS? (Hint: No)
- Is RDS cheaper? (Hint: No, not really)
The document discusses three important things for IT leaders to know about SQL Server: database performance and speed matter; backups and disaster recovery plans are not all equal; and high availability/disaster recovery (HA/DR) tools provide proactive disaster protection. It provides tips on optimizing database performance through query tuning instead of hardware upgrades. It explains the importance of backing up transaction logs and having comprehensive disaster recovery plans, including solutions like AlwaysOn availability groups. The document promotes the services of SQLWatchmen for database diagnostics, tuning, disaster planning and recovery support.
Running Dataproc At Scale in production - Searce Talk at GDG DelhiSearce Inc
This document provides information about Dataproc, Google Cloud's fully managed Spark and Hadoop service. It discusses how Dataproc allows users to create clusters on-demand to process large datasets in a flexible and cost-effective manner. It also covers how Dataproc integrates with other Google Cloud services and provides open-source tools like Spark, Hadoop, Hive and Pig. Additionally, it summarizes best practices for using Dataproc such as leveraging initialization actions, specifying cluster versions, and using the Jobs API for submissions.
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
Talk delivered at SCaLE10x, Los Angeles 2012.
Cloud Computing introduces new challenges for performance
analysis, for both customers and operators of the cloud. Apart from
monitoring a scaling environment, issues within a system can be
complicated when tenants are competing for the same resources, and are
invisible to each other. Other factors include rapidly changing
production code and wildly unpredictable traffic surges. For
performance analysis in the Joyent public cloud, we use a variety of
tools including Dynamic Tracing, which allows us to create custom
tools and metrics and to explore new concepts. In this presentation
I'll discuss a collection of these tools and the metrics that they
measure. While these are DTrace-based, the focus of the talk is on
which metrics are proving useful for analyzing real cloud issues.
This document provides best practices for optimizing Blackboard Learn performance. It recommends deploying for performance from the start, optimizing platform components continuously through measurements, using scalable deployments like 64-bit architectures and virtualization, improving page responsiveness through techniques like gzip compression and image optimization, optimizing the web server, Java Virtual Machine, and database through configuration and tools. It emphasizes the importance of understanding resource utilization, wait events, execution plans, and statistics/histograms for database optimization.
All the fundamental concepts and tools for understanding performance tuning in Java. Garbage collection, memory management and collector types and tools for profiling Java applications.
This is the story of how we managed to scale and improve Tappsi’s RoR RESTful API to handle our ever-growing load - told from different perspectives: infrastructure, data storage tuning, web server tuning, RoR optimization, monitoring and architecture design.
Most mid-sized Django websites thrive by relying on memcached. Though what happens when basic memcached is not enough? And how can one identify when the caching architecture is becoming a bottleneck? We'll cover the problems we've encountered and solutions we've put in place.
PGConf.ASIA 2019 Bali - Tune Your LInux Box, Not Just PostgreSQL - Ibrar AhmedEqunix Business Solutions
This document discusses tuning Linux and PostgreSQL for performance. It recommends:
- Tuning Linux kernel parameters like huge pages, swappiness, and overcommit memory. Huge pages can improve TLB performance.
- Tuning PostgreSQL parameters like shared_buffers, work_mem, and checkpoint_timeout. Shared_buffers stores the most frequently accessed data.
- Other tips include choosing proper hardware, OS, and database based on workload. Tuning queries and applications can also boost performance.
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
Go90 is a mobile entertainment platform offering access to live and on demand videos. We built the web services platform and social features like activity feed for go90 by making heavy use of Cassandra and Scala, and would like to share what we learned during development and while operating go90. In this presentation, we cover our data model evolution from the initial prototypes to the current production version and the significant performance gain by using a better data model. We will explain how we apply time series data modeling and the benefits of using expiring columns with DateTieredCompactionStrategy. We will also talk about interesting experiences related to table modifications, tombstones and table pagination. On the operations side, we will discuss our findings on java driver usage, performance, monitoring, cluster maintenance, version upgrade, 2-way ssl and many more. We hope you can learn from our mistakes instead of making them yourself!
About the Speakers
Christopher Webster Software Engineer, AOL
Christopher Webster works on the web services platform for the go90 AOL project. Previously he was a Computer Scientist for the Mission Control Technologies project at NASA Ames Center. Chris worked as a senior staff engineer at Sun Microsystems for Project zembly, the cloud development and deployment environment as well as technical lead in many NetBeans projects. Chris is an author of the NetBeans Field Guide and Assemble the Social Web With Zembly.
Thomas Ng Software Engineer, AOL
Thomas Ng is a software engineer at AOL, building web services for the go90 mobile entertainment platform using Cassandra, Scala and Kafka.
Organizations continue to adopt Solr because of its ability to scale to meet even the most demanding workflows. Recently, LucidWorks has been leading the effort to identify, measure, and expand the limits of Solr. As part of this effort, we've learned a few things along the way that should prove useful for any organization wanting to scale Solr. Attendees will come away with a better understanding of how sharding and replication impact performance. Also, no benchmark is useful without being repeatable; Tim will also cover how to perform similar tests using the Solr-Scale-Toolkit in Amazon EC2.
Similar to Scylla Summit 2022: Scylla 5.0 New Features, Part 2 (20)
Radically Outperforming DynamoDB @ Digital Turbine with SADA and Google CloudScyllaDB
Digital Turbine, the Leading Mobile Growth & Monetization Platform, did the analysis and made the leap from DynamoDB to ScyllaDB Cloud on GCP. Suffice it to say, they stuck the landing. We'll introduce Joseph Shorter, VP, Platform Architecture at DT, who lead the charge for change and can speak first-hand to the performance, reliability, and cost benefits of this move. Miles Ward, CTO @ SADA will help explore what this move looks like behind the scenes, in the Scylla Cloud SaaS platform. We'll walk you through before and after, and what it took to get there (easier than you'd guess I bet!).
Petabytes: That's the data volume currently being managed within ScyllaDB Cloud. In this keynote, ScyllaDB's Director of Product Michael Hollander shares how ScyllaDB Cloud harnesses cutting-edge technologies to manage massive datasets efficiently, providing insights into its robust features like API and Terraform integration, data security through encryption at rest, and advanced networking options such as VPC Peering and Transit Gateway, along with upcoming features and enhancements for 2024.
The Strategy Behind ReversingLabs’ Massive Key-Value MigrationScyllaDB
ReversingLabs recently completed the largest migration in their history: migrating more than 300 TB of data, more than 400 services, and data models from their internally-developed key-value database to ScyllaDB seamlessly, and with ZERO downtime. Services using multiple tables — reading, writing, and deleting data, and even using transactions — needed to go through a fast and seamless switch. So how did they pull it off? Martina shares their strategy, including service migration, data modeling changes, the actual data migration, and how they addressed distributed locking.
In ScyllaDB 6.0, we complete the transition to strong consistency for all of the cluster metadata. In this session, Konstantin Osipov covers the improvements we introduce along the way for such features as CDC, authentication, service levels, Gossip, and others.
CTO Insights: Steering a High-Stakes Database MigrationScyllaDB
In migrating a massive, business-critical database, the Chief Technology Officer's (CTO) perspective is crucial. This endeavor requires meticulous planning, risk assessment, and a structured approach to ensure minimal disruption and maximum data integrity during the transition. The CTO's role involves overseeing technical strategies, evaluating the impact on operations, ensuring data security, and coordinating with relevant teams to execute a seamless migration while mitigating potential risks. The focus is on maintaining continuity, optimising performance, and safeguarding the business's essential data throughout the migration process
MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time MLScyllaDB
Tractian, an AI-driven industrial monitoring company, recently discovered that their real-time ML environment needed to handle a tenfold increase in data throughput. In this session, JP Voltani (Head of Engineering at Tractian), details why and how they moved to ScyllaDB to scale their data pipeline for this challenge. JP compares ScyllaDB, MongoDB, and PostgreSQL, evaluating their data models, query languages, sharding and replication, and benchmark results. Attendees will gain practical insights into the MongoDB to ScyllaDB migration process, including challenges, lessons learned, and the impact on product performance.
Inside Expedia's Migration to ScyllaDB for Change Data CaptureScyllaDB
Databases Migrations are no fun, and there are several different strategies and considerations one must be aware of prior to actually doing it in production. In this talk, Jean Carlo and Manikar Rangu will deep dive on Expedia’s migration journey from Cassandra to ScyllaDB. They cover the aspects and pitfalls the team needed to overcome as part of their Identity service project.
Terraform Best Practices for Infrastructure ScalingScyllaDB
Terraform is a GREAT tool, but like a lot of other things in life, it has its pitfalls and bad practices.
Since you are working with Terraform, you probably went through its documentation, which can tell you what resources can be used - BUT do you always have a clear path towards using these resources? How should you structure your Terraform code in general?
And what about scaling? How do you make the most of Terraform when scaling your infrastructure as your organization grows?
In this talk, I’ll cover useful best practices, pitfalls to avoid and major obstacles to anticipate so that you can scale across many teams, avoid refactoring, and get a flying start now -- AND optimize for the future.
You’ll also gain a go-to approach and a paved way for working with Terraform, whether it’s an existing codebase or a new functionality altogether, and also hopefully make you think about the big picture and utilize Terraform in a broader context rather than just an “infrastructure as code"" tool.
Elasticity vs. State? Exploring Kafka Streams Cassandra State StoreScyllaDB
kafka-streams-cassandra-state-store' is a drop-in Kafka Streams State Store implementation that persists data to Apache Cassandra.
By moving the state to an external datastore the stateful streams app (from a deployment point of view) effectively becomes stateless. This greatly improves elasticity and allows for fluent CI/CD (rolling upgrades, security patching, pod eviction, ...).
It also can also help to reduce failure recovery and rebalancing downtimes, with demos showing sporty 100ms rebalancing downtimes for your stateful Kafka Streams application, no matter the size of the application’s state.
As a bonus accessing Cassandra State Stores via 'Interactive Queries' (e.g. exposing via REST API) is simple and efficient since there's no need for an RPC layer proxying and fanning out requests to all instances of your streams application.
DynamoDB to ScyllaDB: Technical Comparison and the Path to SuccessScyllaDB
What can you expect when migrating from DynamoDB to ScyllaDB? This session provides a jumpstart based on what we’ve learned from working with your peers across hundreds of use cases. Discover how ScyllaDB’s architecture, capabilities, and performance compares to DynamoDB’s. Then, hear about your DynamoDB to ScyllaDB migration options and practical strategies for success, including our top do’s and don’ts.
ScyllaDB Real-Time Event Processing with CDCScyllaDB
ScyllaDB’s Change Data Capture (CDC) allows you to stream both the current state as well as a history of all changes made to your ScyllaDB tables. In this talk, Senior Solution Architect Guilherme Nogueira will discuss how CDC can be used to enable Real-time Event Processing Systems, and explore a wide-range of integrations and distinct operations (such as Deltas, Pre-Images and Post-Images) for you to get started with it.
MongoDB to ScyllaDB: Technical Comparison and the Path to SuccessScyllaDB
What can you expect when migrating from MongoDB to ScyllaDB? This session provides a jumpstart based on what we’ve learned from working with your peers across hundreds of use cases. Discover how ScyllaDB’s architecture, capabilities, and performance compares to MongoDB’s. Then, hear about your MongoDB to ScyllaDB migration options and practical strategies for success, including our top do’s and don’ts.
Real-Time or Analytics Workloads... Why Not Both?ScyllaDB
ScyllaDB’s Workload Prioritization provides resource optimization and performance isolation across workloads with different performance needs, such as Analytics and Real-time. In this session, you will learn how Workload Prioritization works, how you can use it to run different types of workloads together under a single ScyllaDB cluster, and how to fine-tune priorities and resource allocation based on your specific requirements.
Supercell is the game developer behind Hay Day, Clash of Clans, Boom Beach, Clash Royale and Brawl Stars. Learn how they unified real-time event streaming for a social platform with hundreds of millions of users.
ScyllaDB Leaps Forward with Dor Laor, CEO of ScyllaDBScyllaDB
Join ScyllaDB’s CEO, Dor Laor, as he introduces the revolutionary tablet architecture that makes one of the fastest databases fully elastic. Dor will also detail the significant advancements in ScyllaDB Cloud’s security and elasticity features as well as the speed boost that ScyllaDB Enterprise 2024.1 received.
An All-Around Benchmark of the DBaaS MarketScyllaDB
The entire database market is moving towards Database-as-a-Service (DBaaS), resulting in a heterogeneous DBaaS landscape shaped by database vendors, cloud providers, and DBaaS brokers. This DBaaS landscape is rapidly evolving and the DBaaS products differ in their features but also their price and performance capabilities. In consequence, selecting the optimal DBaaS provider for the customer needs becomes a challenge, especially for performance-critical applications.
To enable an on-demand comparison of the DBaaS landscape we present the benchANT DBaaS Navigator, an open DBaaS comparison platform for management and deployment features, costs, and performance. The DBaaS Navigator is an open data platform that enables the comparison of over 20 DBaaS providers for the relational and NoSQL databases.
This talk will provide a brief overview of the benchmarked categories with a focus on the technical categories such as price/performance for NoSQL DBaaS and how ScyllaDB Cloud is performing.
Discover the Unseen: Tailored Recommendation of Unwatched ContentScyllaDB
The session shares how JioCinema approaches ""watch discounting."" This capability ensures that if a user watched a certain amount of a show/movie, the platform no longer recommends that particular content to the user. Flawless operation of this feature promotes the discover of new content, improving the overall user experience.
JioCinema is an Indian over-the-top media streaming service owned by Viacom18.
So You've Lost Quorum: Lessons From Accidental DowntimeScyllaDB
The best thing about databases is that they always work as intended, and never suffer any downtime. You'll never see a system go offline because of a database outage. In this talk, Bo Ingram -- staff engineer at Discord and author of ScyllaDB in Action --- dives into an outage with one of their ScyllaDB clusters, showing how a stressed ScyllaDB cluster looks and behaves during an incident. You'll learn about how to diagnose issues in your clusters, see how external failure modes manifest in ScyllaDB, and how you can avoid making a fault too big to tolerate.
ScyllaDB is making a major architecture shift. We’re moving from vNode replication to tablets – fragments of tables that are distributed independently, enabling dynamic data distribution and extreme elasticity. In this keynote, ScyllaDB co-founder and CTO Avi Kivity explains the reason for this shift, provides a look at the implementation and roadmap, and shares how this shift benefits ScyllaDB users.
Getting the Most Out of ScyllaDB Monitoring: ShareChat's TipsScyllaDB
ScyllaDB monitoring provides a lot of useful information. But sometimes it’s not easy to find the root of the problem if something is wrong or even estimate the remaining capacity by the load on the cluster. This talk shares our team's practical tips on: 1) How to find the root of the problem by metrics if ScyllaDB is slow 2) How to interpret the load and plan capacity for the future 3) Compaction strategies and how to choose the right one 4) Important metrics which aren’t available in the default monitoring setup.
Essentials of Automations: Exploring Attributes & Automation ParametersSafe Software
Building automations in FME Flow can save time, money, and help businesses scale by eliminating data silos and providing data to stakeholders in real-time. One essential component to orchestrating complex automations is the use of attributes & automation parameters (both formerly known as “keys”). In fact, it’s unlikely you’ll ever build an Automation without using these components, but what exactly are they?
Attributes & automation parameters enable the automation author to pass data values from one automation component to the next. During this webinar, our FME Flow Specialists will cover leveraging the three types of these output attributes & parameters in FME Flow: Event, Custom, and Automation. As a bonus, they’ll also be making use of the Split-Merge Block functionality.
You’ll leave this webinar with a better understanding of how to maximize the potential of automations by making use of attributes & automation parameters, with the ultimate goal of setting your enterprise integration workflows up on autopilot.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframePrecisely
Inconsistent user experience and siloed data, high costs, and changing customer expectations – Citizens Bank was experiencing these challenges while it was attempting to deliver a superior digital banking experience for its clients. Its core banking applications run on the mainframe and Citizens was using legacy utilities to get the critical mainframe data to feed customer-facing channels, like call centers, web, and mobile. Ultimately, this led to higher operating costs (MIPS), delayed response times, and longer time to market.
Ever-changing customer expectations demand more modern digital experiences, and the bank needed to find a solution that could provide real-time data to its customer channels with low latency and operating costs. Join this session to learn how Citizens is leveraging Precisely to replicate mainframe data to its customer channels and deliver on their “modern digital bank” experiences.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor IvaniukFwdays
At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience
3. Asias He
■ Asias He is a long-time open source developer who previously
worked on Debian Project, Solaris Kernel, KVM Virtualization for
Linux and OSv unikernel. He now works on Seastar and Scylla.
Principal Software Engineer
YOUR PHOTO
GOES HERE
6. What is RBNO
■ Use row level repair as the underlying mechanism to sync data between nodes
instead of streaming
■ Single mechanism for all the node operations
• Bootstrap / replace / rebuild / decommission / removenode / repair
7. Benefits of RBNO
Significant improvements on performance and data safety
■ Resumable
• Resume from previous failed bootstrap operations
■ Consistency
• Latest replica is guaranteed
■ Simplified
• No need to run repair before or after node operations like replace and removenode
■ Unified
• All node ops use the same underlying mechanism
8. Towards using RBNO by default
■ Enabled by default for replace operations
■ More operations will use RBNO by default in the future
■ All node operations are supported
■ Options to turn on specific node operations
• E.g., --enable-repair-based-node-ops true
• E.g., --allowed-repair-based-node-ops replace, bootstrap
■ Better IO scheduler improvement to reduce latency impact
10. Introduction
Make compaction during node operations more efficient
■ What is it
• Sstables generated by node operations are kept in a separate data set
• Compact them together and integrate to main set when node operation is done
■ Benefits
• Less compaction work during node operations
• Faster to complete node operations
11. Current status
■ Enabled for all node ops
• repair, bootstrap, replace, decommission, repair, rebuild
■ Normal trigger for node ops
• Trigger at the end of node operation
■ Smart trigger for repair
• Wait for more repair to come to batch more off-strategy compaction
13. What is it
■ A new feature to use RPC verbs instead of gossip status to perform node
operations like add or remove node.
14. ■ Require all nodes to participate by default
• Avoid data inconsistency issue if nodes are network partitioned
• Allow users to ignore dead nodes explicitly
• nodetool removenode --ignore-dead-nodes
• scylla --ignore-dead-nodes-for-replace
■ Automatically revert to previous state in case of error
■ Detect user operation mistakes
• Detect and reject multiple node operations in parallel
■ Each operations is assigned with a UUID
• Easier to identify a node operation
Benefits
15. How to use it
■ No action is needed from the user
■ Enabled for bootstrap, replace, decommission and removenode.
18. Asias He
■ Asias He is a long-time open source developer who previously
worked on Debian Project, Solaris Kernel, KVM Virtualization for
Linux and OSv unikernel. He now works on Seastar and Scylla.
Principal Software Engineer
YOUR PHOTO
GOES HERE
19. Agenda
■ Background of tombstones
■ Timeout based tombstone GC
■ Repair based tombstone GC
20. Background
■ Tombstones are used to delete data
■ Can’t keep tombstones forever
■ Tombstones GC happens
• Data covered by the tombstone and tombstone can compact away together
• It is old enough, older than gc_grace_seconds
■ Tombstones might be missed on some replica
■ Data resurrection happens
• Replica nodes with tombstone do GC
• Replica nodes without tombstone still contains the data should be deleted
• Read return deleted data to the user
21. Current timeout based tombstone gc
■ Must run full cluster wide repair in gc_grace_seconds
■ Not robust enough
• Nothing guarantees if one can finish repair in time
• Repair is a low priority maintenance operation
■ Pressure to people who operates scylla
■ Performance impact over critical period
22. Introduce repair based tombstone gc
■ The idea
• GC a tombstone only after repair is performed
■ Main benefits
• No need to tune and find a proper gc_grace_seconds
• No data resurrection if cluster wide repair couldn’t be performed within gc_grace_seconds
• Less pressure for users operating scylla clusters to run repairs in a timely manner
• Throttle repair intensity even more
• Reduce the latency impact on user workload
• Since there is no more hard requirement to finish repair in time.
• If repair is performed more frequently than gc_grace_seconds
• Tombstones can be garbage-collected sooner
• Improving performance
23. How to use it
■ ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'} ;
• The mode can be {timeout, repair, disabled, immediate}
• timeout = gc tombstone after gc_grace_seconds
• repair = gc tombstone after repair
• disabled = never gc tombstone
• immediate = gc tombstone immediately
■ CREATE TABLE ks.cf (key blob PRIMARY KEY, val blob) WITH tombstone_gc =
{'mode':'repair'};
24. More considerations
■ When to use mode = immediate
• Use it for TWCS with no user deletes
• Safer than gc_grace_seconds = 0
• Reject deletes if mode = immediate
■ When to use mode = disabled
• Tools that may load scylla with out of order writes or writes in the past, e.g., sstableloader
and cdc replicator
• Disable tombstone gc when the tools are in progress.
■ What happens if mode = repair but repair can not finish for some reason
• A new restful api to fake repair history
• Use it as emergency to allow gc
• Turn back to mode = timeout
25. How to upgrade from existing cluster
■ A gossip feature TOMBSTONE_GC_OPTIONS is added
■ The tombstone_gc option can not be used until full upgrade
• E.g., in a mixed cluster:
• cqlsh> ALTER TABLE ks.table WITH tombstone_gc = { 'mode':'repair'} ;
• ConfigurationException: tombstone_gc option not supported by the cluster
■ To keep max compatibility and introduce less surprise to users
• All tables default to use the mode = timeout (same as without this feature)
• Uses have to set mode = repair explicitly
28. Raphael S. Carvalho
■ Member of the ScyllaDB storage team
■ Responsible for the compaction subsystem
■ Previously worked on Syslinux and OSv
Software Engineer at ScyllaDB
29. Agenda
■ Space optimization for incremental compaction
■ “Bucketless” time series, i.e. time series made much easier for you
■ Upcoming improvements
30. Let’s take a look back
■ Incremental compaction (ICS) introduced in enterprise release 2019.1.4
■ Known for combining techniques from both size-tiered and leveled strategies
■ Fixes the 100% space overhead problem in size-tiered compaction, increasing disk utilization significantly.
31. Is it enough?
■ Space overhead in tiered compaction was efficiently fixed, however…
■ Incremental (ICS) and size-tiered (STCS) strategies share the same space amplification (~2-4x)
with overwrite workloads, where:
• They cover a similar region in the three-dimensional efficiency space, also known as RUM conjecture
trade-offs.
READ
WRITE SPACE
STCS ICS
32. Turns out it’s not enough. But can we do better?
■ Leveled strategy and Size-tiered (or ICS) cover different regions
• Interesting regions cannot be reached with either strategies.
• But interesting regions can be reached by combining data layout of both strategies
• i.e. a hybrid (tiered + leveled) approach
READ
WRITE SPACE
STCS ICS
LCS
33. Let’s work to optimize space efficiency then
■ A few high-level goals:
• Optimize space efficiency with overwrite workloads
• Ensure write and read latency meet SLA requirements
34. ■ That’s Space Amplification Goal (SAG) for you.
■ Increased storage density per node? YES.
■ Reduce costs? YES.
35. A few facts about this feature
■ This feature (available since Scylla Enterprise 2020.1.6) can only be used with Incremental Compaction
■ Compaction will dynamically adapt to the workload to meet requirements
■ Under heavy write load, compaction strategy will work to meet write latency requirement.
■ Otherwise, strategy works to optimize space efficiency to the desired extent
■ Translates into:
• Storage Density per node ++
• Costs --
• Scale ++
READ
WRITE SPACE
ICS+SAG
36. Enabling the space optimization (SAG)
■ This will enable the feature with a space amplification goal of 1.5
■ The lower the configured value the higher the write amplification
■ Adaptive approach minimizes the impact of extra amplification
■ Gives user control to reach interesting regions in the three-dimensional efficiency space
ALTER TABLE keyspace.table
WITH compaction = {
'class': 'IncrementalCompactionStrategy',
'space_amplification_goal': '1.5',
};
38. A common schema for time series looked like…
CREATE TABLE billy.readings (
sensor_id int,
date date,
time time,
temperature int,
PRIMARY KEY ((sensor_id, date), time)
)
39. Why bucket in time series?
■ Large partitions were known to causing all sorts of problems
• Index inefficiency when reading from the middle of a large partition
• Latency issues when repairing large partitions
• High resource usage and read amplification when querying multiple time windows
• Reactor stalls which caused higher P99 latencies
• And so on…
■ Consequently applications were forced to “bucket” partitions to keep their size within a limit.
40. Bucketed vs Unbucketed time series
Window
sstable 1 sstable 2 sstable 3 sstable 4
Bucketed partitions for a
single time series
Unbucketed partition for a
single time series
41. Time series made much easier for you!
■ But those bad days are gone!
• Scylla allows a large partition to be efficiently indexed: O(logN)
• Scylla’s row-level repair allows large partitions to be efficiently repaired
• TimeWindowCompactionStrategy can now efficiently query multiple time windows
• by discarding SSTable files which time range is irrelevant for the query
• Incrementally open the relevant files to reduce resource overhead
• Therefore, read amplification and resource usage problems are fixed
42. A schema for time series can now look like…
■ There’s no longer any field date in schema, meaning that:
• Application won’t have to create new partitions on a fixed interval for a time series
• Querying a time series will be much easier as only a single partition is involved
■ Bucketing days are potentially gone!
■ Lots of complexity reduced in the application side
CREATE TABLE billy.readings (
sensor_id int,
time time,
temperature int,
PRIMARY KEY (sensor_id, time)
)
43. Upcoming improvements
■ Compaction becoming overall more resilient / performant:
• Changes were recently made to make Cleanup, Major compactions more resilient when system is running
out of disk space
• Dynamic control of compaction fan-in to increase overall compaction efficiency
• Based on observation that efficiency is a function of number of input files and their relative sizes
• Don’t dilute the overall efficiency by submitting jobs which efficiency is greater than or equal to
efficiency of ongoing jobs
• Tests show that write amplification is reduced under heavy write load while keeping space and
read amplifications within bounds
• Makes the system adapt even better to changing workloads
• More stability. More performance.
44. Upcoming improvements
■ Reduce compaction aggressiveness by:
• Improvements in I/O scheduler (Pavel Emelyanov covers this in depth in his talk)
• Improvements in Compaction backlog controller
• Aiming at improving tail latency and overall system stability.
■ Off-strategy compaction (Asias He covers this better)
• Make repair-based node operations more efficient, faster
• Consequently, better elasticity
45. Thank you!
Stay in touch
Raphael S. Carvalho
@raphael_scarv
raphaelsc@scylladb.com
Editor's Notes
Hello everyone and welcome to my talk about SSTable compaction enhancements
My name is Raphael Carvalho and I have been working on ScyllaDB storage layer since its early days. Enough about me, let’s move on to the interesting part
In this session, we’ll describe space optimization for incremental compaction that will allow the storage density of nodes to increase
Additionally, how Scylla makes it much easier to model time series data. Without having to rely on old techniques like data bucketing, which is commonly used to avoid running into large partition performance issues
Last but not least, we’ll talk about upcoming improvements that will make compaction better for you
Now, let’s take a look back. Incremental compaction strategy, or ICS, was introduced back in 2019, to solve the large space overhead that affected the users. Before its existence, users were left with no choice but to leave 50% of free disk space for compactions to succeed.
But was it enough? Well, the aforementioned space overhead was efficiently fixed by Incremental Compaction, however, it still suffered with bad space amplification when facing overwrite-heavy workloads. That’s because the compaction strategy wasn’t efficient at removing the data redundancy accumulated across the tiers
We use a theoretical model called RUM conjecture to reason about compaction efficiency. It states that a compaction strategy cannot be optimal at the three efficiency goals: Read, write and space. That’s why we have different strategies available, each suiting better a particular use case
If we look at the three-dimensional efficiency space, which represents the RUM conjecture trade-offs, we’ll see that Incremental and size tiered strategies cover a similar region.
Turns out Incremental Compaction can do much better than fixing the space overhead problem. We know for a fact that leveled and size tiered strategies cover completely different regions in the efficiency space. Also, we know that interesting regions cannot be reached with either of them. However, very interesting regions in the efficiency space can be reached by combining the concepts of both strategies. We call it a hybrid approach.
What do we actually want to accomplish with this hybrid approach? Let’s set a few goals.
First, we want to optimize space efficiency for overwrite workloads, while ensuring write and read latencies meet service-level requirements
In other words, Performance must be sufficient to meet the needs, but (space) efficiency should be as good as possible to allow for scale.
That’s Space amplification goal for you. A feature that will help you increasing storage density per node therefore reducing costs. Who doesn’t like that?
It is only available in Scylla enterprise and can be used with our incremental compaction only. Everything was carefully implemented. To ensure latency will meet service-level requirements, compaction will dynamically adapt to the workload.
Under heavy write load, compaction strategy will enter a write optimized mode to make sure system can keep up with the write rate. Otherwise, the strategy will be continuously working to optimize space efficiency. The coexistence of both modes is the reason we call this a hybrid approach.
The adaptive approach, combined with the hybrid one is what makes this feature unique in the compaction world.
Let’s get to a bit of action. How to enable the space optimization? That’s simply a matter of specifying a value between 1 and 2 to the strategy option named space_amplification_goal. The lower the value the lower the space amplification but the higher the write amplification. 1.5 is a good value to start with
In order to optimize space efficiency, we’re willing to trade-off extra write amplification. However, the adaptive approach minimizes the impact of the extra amplification given that the strategy will switch between the modes, that is, write and space, whenever it has to.
To conclude, this will nicely give user control to reach interesting regions in the efficiency space, allowing the strategy to perform better for your particular use case
Now comes the interesting part… the optimization in action. We can clearly see in the graph that the lower the configured value the lower the disk usage will be. In the example, with a value of 1.25, the space amplification reached a maximum of 100%, but eventually went below 50% mark. If the system isn’t under heavy write load, then space amplification will meet the goal faster. As expected, the system is optimizing for space once performance objectives are achieved.
Now let’s switch gears and talk about how Scylla made time series less painful for application developers. Please look at that create table statement. That’s an usual schema for time series data. Note how the field date composes the partition key along with the sensor id. That’s a technique called data bucketing.
This bucketing technique is mainly used to prevent large partitions from being created, as they were known to creating all sorts of performance issues.
For example: Scylla was very inefficient when reading from the middle of a large partition
Repair was a enemy of large partitions too
And Time Window Compaction strategy wasn’t optimized for reading a large partition spanning multiple time windows
In the picture, each individual line represents a partition. In the bucketed case, the time series is split into multiple smaller partitions. While in the unbucketed case, the time series is kept in a single large partition.
One of the main problems with bucketing is that lots of complexity is pushed to the application side. The application will have to keep track of all partitions that belong to a particular time series. Also, aggregation is more complex as the application has to figure out which partitions store a particular time range, query each one of them individually and finally aggregate the results.
Fortunately, those bad days are gone.
Scylla fixed all problems aforementioned. Large partitions can be efficiently indexed, row-level repair was introduced to solve the problem of repairing large partitions, and time window compaction strategy can now efficiently read large partitions stored across multiple time windows
When a table uses time window strategy, its SSTable files do not overlap in timestamp range, so a specialized reader was implemented that will discard irrelevant files and efficiently read the relevant ones. This will reduce resource consumption and read amplification, making the queries much more efficient
With all those problems fixed, the schema for time series application can now look much simpler. Queries becomes simpler. Your application becomes simpler. Please only make sure you have enough time series (partitions) to avoid hot spots, where a subset of shards may be processing much more data than its counterparts. For example, if your application is monitoring millions of devices, where each has its own time series, then you will not run into unbalance issue. But if you only have a few time series in your application, it’s better to rely on the old bucketing technique, to guarantee proper balancing.
As for upcoming improvements,
Cleanup and major compaction will now be more resilient when system is running out of disk space.
The compaction manager will now dynamically control the compaction fan-in, essentially a threshold on minimum # of input files for compaction, to increase overall compaction efficiency, which translates into lower write amplification.
This decision is based on observation that compaction efficiency is a function # of input files for compaction, and also their relative sizes. That’s why size tiered and ICS favor similar sized files to be compacted together
Essentially, we will increase the overall efficiency by not diluting it with compaction jobs that are less inefficient than the ongoing compaction jobs. System becomes more stable and performant as a result of this change.
Last but not least, I/O scheduler is being nicely enhanced by Pavel Emelyanov. The enhancements will make the system more stable, allowing compaction to impact less the other activities in the system like user queries, streaming, and so on. To learn more about this, you can watch Pavel’s talk.
And off-strategy compaction was written with the goal of making compaction less aggressive for node operations like bootstrap, regular repair, etc, allowing them to complete faster. Consequently the system will be able to scale faster, making Scylla elasticity even better.