SlideShare a Scribd company logo
1 of 59
Download to read offline
Squeezing the Most Out of
the Storage Engine with
State of the Art Compaction
Raphael S. Carvalho, Software Engineer
Raphael Carvalho
■ Syslinux, suite of bootloaders
■ OSv, an operating system for the cloud
■ Seastar, the framework powering ScyllaDB
■ ScyllaDB, the best database in the world
“In order to make good use of the computer
resources, one must organize files intelligently,
making the retrieval process efficient.”
The Ubiquitous B-Tree paper, 1979
■ Short & precise definition from aforementioned paper:
■ “allow users to store, update, and recall”
Storage Engines
■ Two approaches for handling updates
■ In-place structure (B+-tree)
Storage Engines
■ Two approaches for handling updates
■ In-place structure (B+-tree)
Storage Engines
(k1,v1)(k2,v2)
■ Two approaches for handling updates
■ In-place structure (B+-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1, v3)
■ Two approaches for handling updates
■ In-place structure (ex: B+-tree)
Storage Engines
(k1,v3)(k2,v2)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Out-of-place update isn’t new.
■ 1976 paper “Differential files” shows its applicability in the real world
■ “shown to be an efficient method for storing a large and changing
database”
Storage Engines
■ A good analogy is presented in the paper
Storage Engines
■ The Log-Structured Merge-Tree (LSM-Tree)
paper is then published in 1996
Storage Engines
Storage Engines
THE LSM-TREE
writes
C0
C1
C2
Ck
MEMORY
DISK
merge sort
Storage Engines
THE LSM-TREE
C1 is T times bigger than C0.
C(K) is T times bigger than C(K-1).
C0
C1
C2
Ck
MEMORY
DISK
merge sort
■ Immutability of LSM tree components (ex: SSTables) simplifies
■ Concurrency control
■ Recovery
Storage Engines
Query on LSM Tree
(k1, v2)
(k1, v1)
MEMORY
DISK
Query
k1
■ A compaction policy (or strategy) defines the shape of LSM tree
■ Any policy is composed of 4 primitives
■ Trigger (when to compact)
■ File picking policy (which data to compact)
■ Granularity (how much data at once)
■ Layout (how data is laid out)
LSM-tree compaction policy
Pure Leveled in Original LSM Design
ONLY 1 COMPONENT PER LEVEL!
C0
C1
C2
Ck
MEMORY
DISK
merge sort
Flexible Leveled in Modern LSM Design
MEMORY
DISK
L0
L1
Flexible Leveled in Modern LSM Design
MEMORY
DISK
L0
L1
Flexible Leveled in Modern LSM Design
MEMORY
DISK
L0
L1
■ Partitions the LSM-tree components into (usually fixed-size) fragments
■ Subset of a level can be merged into the next one (partial merge)
■ Bounds:
■ compaction operation time
■ temporary disk space during compaction lifetime
Partitioning Optimization for Leveled
Partitioning Optimization for Leveled
MEMORY
DISK
L1
L2
KEY RANGE
SST
SST SST SST
SST
SST
Partitioning Optimization for Leveled
MEMORY
DISK
L1
L2
KEY RANGE
SST
SST SST SST
SST
SST
Leveled Policy - Cost Analysis
■ Let T be the size ratio between adjacent levels
■ Let L be the number of levels for a given LSM tree
■ Write amplification:
■ Space amplification:
O(T * L)
O(T + 1)
------ = ~1.1
T
Stepped-Merge Algorithm
■ 1997 paper Incremental organization for data recording and
warehousing -> a new approach to LSM tree layout
■ “Our goal is to design a technique that supports both insertion and
queries with reasonable efficiency, and without the delays of periodic
batch processing.”
■ Gives birth to the tiered compaction policy
Tiered Compaction Policy
MEMORY
DISK
L0
L1
SST
FILE SIZE
Tiered Compaction Policy
MEMORY
DISK
L0
L1
SST
FILE SIZE
SST
Tiered Compaction Policy
MEMORY
DISK
L0
L1
FILE SIZE
SST
Tiered Policy - Cost Analysis
■ Let T be the size ratio between adjacent levels
■ Let L be the number of levels for a given LSM tree
■ Write amplification:
■ Space amplification:
O(L)
O(T * L)
Now ScyllaDB journey begins
The database inherited all the LSM-tree
improvements described so far…
But they weren’t enough
Tiered - Temporary Space Problem!
MEMORY
DISK
L0
L1
FILE SIZE
SST SST
Tiered - Temporary Space Problem!
MEMORY
DISK
L0
L1
FILE SIZE
SST SST
SST
100% TEMP SPACE OVERHEAD
Partitioning Optimization for Tiered
MEMORY
DISK
L0
L1
FILE SIZE
S S T S S T
Partitioning Optimization for Tiered
MEMORY
DISK
L0
L1
FILE SIZE
S S T S S T
S
Partitioning Optimization for Tiered
MEMORY
DISK
L0
L1
FILE SIZE
S T S T
S
Tiered Policy - Partitioning Optimization
■ Bounds temporary space overhead significantly
■ Allows disk space usage from 50% to 80% and beyond.
■ Available in ScyllaDB as Incremental Compaction Strategy (ICS)
LSM tree - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
LSM tree - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
LEVELED
LSM tree - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
LEVELED
PURE
TIERED
But the world is not only black and white
There are shades of gray in between…
Hybrid LSM-tree data layout
■ Largest level is space optimized
■ Other levels are write optimized
■ Addresses O(K) space amplification in tiered in overwrite workloads
■ Where K = number of components per level
Hybrid LSM-tree data layout
L1
L2
FILE SIZE
L0
SST
SST
SST SST
WRITE OPTIMIZED LEVELS
SPACE OPTIMIZED LEVEL
Hybrid LSM-tree data layout
L1
L2
FILE SIZE
L0
SST
SST
WRITE OPTIMIZED LEVELS
SPACE OPTIMIZED LEVEL
SST
Hybrid LSM - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
TIERED
PURE
LEVELED
HYBRID
Hybrid LSM - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
TIERED
PURE
LEVELED
HYBRID
Hybrid LSM-tree data layout
■ Reduces space amplification in overwrite-intensive workloads
■ = less space amplification
■ = increased storage density per node
■ = more money in your pocket.
■ Available as space amplification goal (SAG) option of Incremental
Compaction Strategy.
LSM-tree & tombstones
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
LSM-tree & tombstones
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
TOMBSTONE
LSM-tree & tombstones
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
Suboptimal LSM-tree tombstone handling
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
GARBAGE
COLLECTION
Efficient LSM-tree tombstone handling
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
GARBAGE
COLLECTION
Efficient LSM-tree tombstone handling
■ Piggyback on incremental compaction, to bound temporary disk
space.
■ Triggers (avoids write amplification issues):
■ File staleness
■ Tombstone density threshold
■ Available in Incremental Compaction Strategy (ICS) by default.
Thank You
Stay in Touch
Raphael Carvalho
raphaelsc@scylladb.com
@raphael_scarv
raphaelsc

More Related Content

What's hot

The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache icebergAlluxio, Inc.
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoDatabricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsHostedbyConfluent
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High PerformanceInderaj (Raj) Bains
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsDataWorks Summit
 

What's hot (20)

The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Apache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan FlonenkoApache Spark on K8S and HDFS Security with Ilan Flonenko
Apache Spark on K8S and HDFS Security with Ilan Flonenko
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 

Similar to Scaling ScyllaDB Storage Engine with State-of-Art Compaction

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...In-Memory Computing Summit
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintScyllaDB
 
DB2 and Storage Management
DB2 and Storage ManagementDB2 and Storage Management
DB2 and Storage ManagementCraig Mullins
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
Learn about log structured file system
Learn about log structured file systemLearn about log structured file system
Learn about log structured file systemGang He
 

Similar to Scaling ScyllaDB Storage Engine with State-of-Art Compaction (20)

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage Footprint
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
DB2 and Storage Management
DB2 and Storage ManagementDB2 and Storage Management
DB2 and Storage Management
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Learn about log structured file system
Learn about log structured file systemLearn about log structured file system
Learn about log structured file system
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Scaling ScyllaDB Storage Engine with State-of-Art Compaction

  • 1. Squeezing the Most Out of the Storage Engine with State of the Art Compaction Raphael S. Carvalho, Software Engineer
  • 2. Raphael Carvalho ■ Syslinux, suite of bootloaders ■ OSv, an operating system for the cloud ■ Seastar, the framework powering ScyllaDB ■ ScyllaDB, the best database in the world
  • 3. “In order to make good use of the computer resources, one must organize files intelligently, making the retrieval process efficient.” The Ubiquitous B-Tree paper, 1979
  • 4. ■ Short & precise definition from aforementioned paper: ■ “allow users to store, update, and recall” Storage Engines
  • 5. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines
  • 6. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines (k1,v1)(k2,v2)
  • 7. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines (k1,v1)(k2,v2) (k1, v3)
  • 8. ■ Two approaches for handling updates ■ In-place structure (ex: B+-tree) Storage Engines (k1,v3)(k2,v2)
  • 9. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines
  • 10. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2)
  • 11. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 12. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 13. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 14. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 15. ■ Out-of-place update isn’t new. ■ 1976 paper “Differential files” shows its applicability in the real world ■ “shown to be an efficient method for storing a large and changing database” Storage Engines
  • 16. ■ A good analogy is presented in the paper Storage Engines
  • 17. ■ The Log-Structured Merge-Tree (LSM-Tree) paper is then published in 1996 Storage Engines
  • 19. Storage Engines THE LSM-TREE C1 is T times bigger than C0. C(K) is T times bigger than C(K-1). C0 C1 C2 Ck MEMORY DISK merge sort
  • 20. ■ Immutability of LSM tree components (ex: SSTables) simplifies ■ Concurrency control ■ Recovery Storage Engines
  • 21. Query on LSM Tree (k1, v2) (k1, v1) MEMORY DISK Query k1
  • 22. ■ A compaction policy (or strategy) defines the shape of LSM tree ■ Any policy is composed of 4 primitives ■ Trigger (when to compact) ■ File picking policy (which data to compact) ■ Granularity (how much data at once) ■ Layout (how data is laid out) LSM-tree compaction policy
  • 23. Pure Leveled in Original LSM Design ONLY 1 COMPONENT PER LEVEL! C0 C1 C2 Ck MEMORY DISK merge sort
  • 24. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  • 25. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  • 26. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  • 27. ■ Partitions the LSM-tree components into (usually fixed-size) fragments ■ Subset of a level can be merged into the next one (partial merge) ■ Bounds: ■ compaction operation time ■ temporary disk space during compaction lifetime Partitioning Optimization for Leveled
  • 28. Partitioning Optimization for Leveled MEMORY DISK L1 L2 KEY RANGE SST SST SST SST SST SST
  • 29. Partitioning Optimization for Leveled MEMORY DISK L1 L2 KEY RANGE SST SST SST SST SST SST
  • 30. Leveled Policy - Cost Analysis ■ Let T be the size ratio between adjacent levels ■ Let L be the number of levels for a given LSM tree ■ Write amplification: ■ Space amplification: O(T * L) O(T + 1) ------ = ~1.1 T
  • 31. Stepped-Merge Algorithm ■ 1997 paper Incremental organization for data recording and warehousing -> a new approach to LSM tree layout ■ “Our goal is to design a technique that supports both insertion and queries with reasonable efficiency, and without the delays of periodic batch processing.” ■ Gives birth to the tiered compaction policy
  • 35. Tiered Policy - Cost Analysis ■ Let T be the size ratio between adjacent levels ■ Let L be the number of levels for a given LSM tree ■ Write amplification: ■ Space amplification: O(L) O(T * L)
  • 36. Now ScyllaDB journey begins The database inherited all the LSM-tree improvements described so far… But they weren’t enough
  • 37. Tiered - Temporary Space Problem! MEMORY DISK L0 L1 FILE SIZE SST SST
  • 38. Tiered - Temporary Space Problem! MEMORY DISK L0 L1 FILE SIZE SST SST SST 100% TEMP SPACE OVERHEAD
  • 39. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S S T S S T
  • 40. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S S T S S T S
  • 41. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S T S T S
  • 42. Tiered Policy - Partitioning Optimization ■ Bounds temporary space overhead significantly ■ Allows disk space usage from 50% to 80% and beyond. ■ Available in ScyllaDB as Incremental Compaction Strategy (ICS)
  • 43. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED
  • 44. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE LEVELED
  • 45. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE LEVELED PURE TIERED
  • 46. But the world is not only black and white There are shades of gray in between…
  • 47. Hybrid LSM-tree data layout ■ Largest level is space optimized ■ Other levels are write optimized ■ Addresses O(K) space amplification in tiered in overwrite workloads ■ Where K = number of components per level
  • 48. Hybrid LSM-tree data layout L1 L2 FILE SIZE L0 SST SST SST SST WRITE OPTIMIZED LEVELS SPACE OPTIMIZED LEVEL
  • 49. Hybrid LSM-tree data layout L1 L2 FILE SIZE L0 SST SST WRITE OPTIMIZED LEVELS SPACE OPTIMIZED LEVEL SST
  • 50. Hybrid LSM - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE TIERED PURE LEVELED HYBRID
  • 51. Hybrid LSM - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE TIERED PURE LEVELED HYBRID
  • 52. Hybrid LSM-tree data layout ■ Reduces space amplification in overwrite-intensive workloads ■ = less space amplification ■ = increased storage density per node ■ = more money in your pocket. ■ Available as space amplification goal (SAG) option of Incremental Compaction Strategy.
  • 54. LSM-tree & tombstones MEMORY DISK L0 L1 FILE SIZE KEY A KEY A TOMBSTONE
  • 56. Suboptimal LSM-tree tombstone handling MEMORY DISK L0 L1 FILE SIZE KEY A KEY A GARBAGE COLLECTION
  • 57. Efficient LSM-tree tombstone handling MEMORY DISK L0 L1 FILE SIZE KEY A KEY A GARBAGE COLLECTION
  • 58. Efficient LSM-tree tombstone handling ■ Piggyback on incremental compaction, to bound temporary disk space. ■ Triggers (avoids write amplification issues): ■ File staleness ■ Tombstone density threshold ■ Available in Incremental Compaction Strategy (ICS) by default.
  • 59. Thank You Stay in Touch Raphael Carvalho raphaelsc@scylladb.com @raphael_scarv raphaelsc