SlideShare a Scribd company logo
1 of 46
Download to read offline
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
SCYLLA’S COMPACTION STRATEGIES
“Big Things” meetup @ Oracle, March 2019.
Nadav Har’El,
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Presenter bio
2
Nadav Har’El has had a diverse 20-year career in
computer programming and computer science. He
worked on high-performance scientific computing,
networking software, information retrieval,
virtualization and operating systems.
Today he works on Scylla.
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Agenda
▪ What is compaction?
▪ Scylla’s compaction strategies:
o Size Tier
o Leveled
o Incremental
o Date Tier
o Time Window
▪ Which should I use for my workload and why?
▪ Examples
3
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
What is compaction?
Scylla’s write path:
4
Writes
commit log
compaction
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(What is compaction?)
▪ Scylla’s write path:
o Updates are inserted into a memory table (“memtable”)
o Memtables are periodically flushed to a new sorted file (“sstable”)
▪ After a while, we have many separate sstables
o Different sstables may contain old and new values of the same cell
o Or different rows in the same partition
o Wastes disk space
o Slows down reads
▪ Compaction: read several sstables and output one (or more)
containing the merged and most recent information
5
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
What is compaction? (cont.)
▪ This technique of keeping sorted files and merging them is
well-known and often called Log-Structured Merge (LSM) Tree
▪ Published in 1996 (but differs from modern implementations).
▪ Earliest popular application that I know of is the Lucene search
engine, 1999
o High performance write.
o Immediately readable.
o Reasonable performance for read.
6
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Compaction Strategy
▪ Which sstables to compact, and when?
▪ This is called the compaction strategy
▪ The goal of the strategy is low amplification:
o Avoid compacting the same data again and again.
• write amplification
o Avoid read requests needing many sstables.
• read amplification
o Avoid overwritten/deleted/expired data staying on disk.
o Avoid excessive temporary disk space needs.
• space amplification
7
Which compaction
strategy shall I
choose?
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #1: Size-Tiered Compaction
▪ Cassandra’s oldest, and still default, compaction strategy.
▪ The default on Scylla too.
▪ Dates back to Google’s BigTable paper (2006).
o Idea used even earlier (e.g., Lucene, 1999).
8
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction strategy
9
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Size-Tiered compaction strategy)
▪ Each time when enough data is in the memory table, flush it to a
small sstable
▪ When several small sstables exist, compact them into one bigger
sstable
▪ When several bigger sstables exist, compact them into one very big
sstable
▪ …
▪ Each time one “size tier” has enough sstables, compact them into
one sstable in the (usually) next size tier
10
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
▪ write amplification: O(logN)
o Most data is in highest tier - needed to pass through O(logN) tiers
• Where “N” is (data size) / (flushed sstable size).
o This is asymptotically optimal
11
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
Read amplification? O(logN) sstables, but:
▪ If workload writes a partition once and never modifies it:
o Eventually each partition’s data will be compacted into one sstable
o In-memory bloom filter will usually allow reading only one sstable
o Optimal
▪ But if workload continues to update the same partition:
o All sstables will contain updates to the same partition
o O(logN) reads per read request
o Reasonable, but not great
12
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
▪ Space amplification
13
“storage space is most
often the primary
bottleneck when using
Flash SSDs under typical
production workloads at
Facebook”
Facebook researchers, 2017.
http://cidrdb.org/cidr2017/papers/p82-dong
-cidr17.pdf
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
Space amplification:
▪ Obsolete data in a huge sstable will remain for a very long time.
▪ Compaction needs a lot of temporary space:
o Worst-case, needs to merge all existing sstables into one and may need
half the disk to be empty for the merged result. (2x)
o Less of a problem in Scylla than Cassandra because of sharding.
o Still a risk - so best practice is to leave half the disk empty...
14
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Size-Tiered compaction - amplification
Space amplification (continued):
▪ When workload is overwrite-intensive, it is even worse:
o Cassandra waits until 4 large sstables.
o All 4 overwrote the same data, so merged amount is same as in 1 sstable
o 5-fold space amplification!
o Even worse: same data in several tiers.
▪ Scylla reduces this space amplification vs. Cassandra:
o Wait for only two sstables of similar size!
o May wait for more depending on workload (automatic controller)
15
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #2: Leveled Compaction
▪ Introduced in Cassandra 1.0, in 2011.
▪ Based on Google’s LevelDB (itself based on Google’s BigTable)
▪ Aims to reduce space and read amplifications.
▪ Replaces huge SSTables with runs:
o A run is a big SSTable split into small (160 MB by default) SSTables
o These have non-overlapping key ranges
o A huge SSTable must be rewritten as a whole, but in a run we can modify only
parts of it (individual sstables) while keeping the disjoint key requirement
▪ In leveled compaction strategy:
16
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction strategy
17
Level 0
Level 1
(run of 10
sstables) Level 2
(run of 100
sstables)
...
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Leveled compaction strategy)
▪ SSTables are divided into “levels”:
o New SSTables (dumped from memtables) are created in Level 0
o Each other level is a run of SSTables of exponentially increasing size:
• Level 1 is a run of 10 SSTables (of 160 MB each)
• Level 2 is a run of 100 SSTables (of 160 MB each)
• etc.
▪ When we have enough (e.g., 4) sstables in Level 0, we compact
them with all 10 sstables in Level 1
o We don't create one large sstable - rather, a run: we write one sstable and
when we reach the size limit (160 MB), we start a new sstable
18
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Leveled compaction strategy)
▪ After the compaction of level 0 into level 1, level 1 may have more
than 10 of sstables. We pick one and compact it into level 2:
o Take one sstable from level 1
o Look at its key range and find all sstables in level 2 which overlap with it
o Typically, there are about 12 of these
• The level 1 sstable spans roughly 1/10th of the keys, while each level 2
sstable spans 1/100th of the keys; so a level-1 sstable’s range roughly
overlaps 10 level-2 sstables plus two more on the edges
o As before, we compact the one sstable from level 1 and the 12 sstables from
level 2 and replace all of those with new sstables in level 2
19
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
(Leveled compaction strategy)
▪ After this compaction of level 1 into level 2, now we can have
excess sstables in level 2 so we merge them into level 3. Again, one
sstable from level 2 will need to be compacted with around 10
sstables from level 3.
20
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Space amplification:
o Because of sstable counts, 90% of the data is in the deepest level (if full!)
o These sstables do not overlap, so it can’t have duplicate data!
o So at most, 10% of the space is wasted
• (Actually, today, 50%, if deepest level is not full. Can be improved.)
o Also, each compaction needs a constant (~12*160MB) temporary space
o Nearly optimal
21
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Read amplification:
o We have O(N) tables!
o But in each level sstables have disjoint ranges (cached in memory)
o Worst-case, O(logN) sstables relevant to a partition - plus L0 size.
o Under some assumptions 10% space amplification implies: 90% of the reads
will need just one sstable!
o Nearly optimal
22
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Write amplification:
23
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - amplification
▪ Write amplification:
o Also O(logN) steps like size-tiered:
• Again, most of the data is in the deepest level k
• All data was written once in L0, then compacted into L1, …, Lk
o But... significantly more writes in each step!
• STCS: reads sstables of total size X, writes X (if no overwrites)
• LCS: one input sstable size X (Li) - merged with ~10X (Li+1) write ~ 11X.
o If enough writing and LCS can’t keep up, its read and space advantages are
lost
o If also have cache-miss reads, they will get less disk bandwidth
24
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Leveled compaction - write amplification
▪ Leveled compactions write amplification is not only a problem with
100% write...
▪ Can have just 10% writes and an amplified write workload so high
that
o Uncached reads slowed down because we need the disk to write
o Compaction can’t keep up, uncompacted sstables pile up, even slower reads
▪ Leveled compaction is unsuitable for many workloads with a
non-negligible amount of writes even if they seem “read mostly”
25
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 - write-only workload
▪ Write-only workload
o Cassandra-stress writing 30 million partitions (about 9 GB of data)
o Constant write rate 10,000 writes/second
o One shard
26
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 (space amplification)
constant multiple of
flushed memtable &
sstable size
27
x2 space
amplification
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 (write amplification)
▪ Amount of actual data collected: 8.8 GB
▪ Size-tiered compaction: 50 GB writes (4 tiers + commit log)
▪ Leveled compaction: 111 GB writes
28
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1 - conclusions
▪ Size-tiered compaction:
at some points needs twice the disk space
o In Scylla with many shards, “usually” maximum space use is not concurrent
▪ Level-tiered compaction:
more than double the amount of disk I/O
o Test used smaller-than default sstables (10 MB) to illustrate the problem
o Same problem with default sstable size (160 MB) - with larger workloads
29
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Can we create a new compaction strategy with
▪ Low write amplification of size-tiered compaction,
▪ Without its high temporary disk space usage during compaction?
30
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #3: Incremental Compaction
▪ New in Scylla Enterprise 2019.1
▪ Improved Size-Tiered strategy, with ideas from Leveled strategy:
31
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #3: Incremental Compaction
▪ Size-tiered compaction needs temporary space because we only
remove a huge sstable after we fully compact it.
▪ Let’s split each huge sstable into a run (as in LCS):
o Treat the entire run (not individual sstables) as a file for STCS
o Remove individual sstables as compacted. Low temporary space.
32
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Incremental compaction - amplification
▪ Space amplification:
o Small constant temporary space needs, even smaller than LCS
(M*S per parallel compaction, e.g., M=2, S=160 MB)
o Overwrite-mostly still a worst-case - but 2-fold, not 5-fold as Cassandra.
▪ Write amplification:
o Same as size-tiered: O(logN), small constant
▪ Read amplification:
o Same as size-tiered: at worst O(logN) if updating the same partitions
33
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 1, with incremental compaction
34
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 2 - overwrite workload
▪ Write 15 times the same 4 million partitions
o cassandra-stress write n=4000000 -pop seq=1..4000000 -schema
"replication(strategy=org.apache.cassandra.locator.SimpleStrategy,factor=1)"
o In this test cassandra-stress not rate limited
o Again, small (10MB) LCS tables
▪ Necessary amount of sstable data: 1.2 GB
▪ STCS space amplification: x7.7 (with Cassandra’s default of 4)
▪ LCS space amplification lower, constant multiple of sstable size
▪ ICS and Scylla’s default of 2, around x2
35
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 2
36
x7.7
space
amplification
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 3 - read+updates workload
▪ When workloads are read-mostly, read amplification is important
▪ When workloads also have updates to existing partitions
o With STCS, each partition ends up in multiple sstables
o Read amplification
▪ An example to simulate this:
o Do a write-only update workload
• cassandra_stress write n=4,000,000 -pop seq=1..1,000,000
o Now run a read-only workload
• cassandra_stress read n=1,000,000 -pop seq=1..1,000,000
• measure avg. number of read bytes per request
37
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 3 - read+updates workload
▪ Size-tiered: 46,915 bytes read per request
o Optimal after major compaction - 11,979
▪ Leveled: 11,982
o Equal to optimal because in this case all sstables fit in L1...
▪ Increasing the number of partitions 8-fold:
o Size-tiered: 29,794 luckier this time
o Leveled: 16,713 unlucky (0.5 of data, not 0.9, in L2)
▪ BUT: Remember that if we have non-negligable amount of writes,
LCS write amplification may slow down reads
38
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Example 3, and major compaction
▪ We saw that size-tiered major compaction reduces read
amplification
▪ It also reduces space amplification (expired/overwritten data)
▪ Major compaction only makes sense if very few writes
o But in that case, LCS’s write amplification is not a problem!
o So LCS is recommended instead of major compaction
• Easier to use
• No huge operations like major compaction (need to find when to run)
• No 50%-free-disk worst-case requirement
• Good read amplification and space amplification
39
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Strategy #4: Time-Window Compaction
▪ Introduced in Cassandra 3.0.8, designed for time-series data
▪ Replaces Date-Tiered compaction strategy of Cassandra 2.1
(which is also supported by Scylla, but not recommended)
40
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Time-Window compaction strategy (cont.)
In a time-series use case:
▪ Clustering key and write time are correlated
▪ Data is added in time order. Only few out-of-order writes, typically
rearranged by just a few seconds
▪ Data is only deleted through expiration (TTL) or by deleting an
entire partition, usually the same TTL on all the data
▪ A query is a clustering-key range query on a given partition
Most common query: "values from the last hour/day/week"
41
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Time-Window compaction strategy (cont.)
▪ Scylla remembers in memory the minimum and maximum
clustering key in each newly-flushed sstable
o Efficiently find only the sstables with data relevant to a query
▪ But most compaction strategies,
o Destroy this feature by merging “old” and “new” sstables
o Move all rows of a partition to the same sstable…
• But time series queries don’t need all rows of a partition, just rows in a
given time range
• Makes it impossible to expire old sstable’s when everything in them has
expired
• Read and write amplification (needless compactions)
42
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Time-Window compaction strategy (cont.)
So TWCS:
▪ Divides time into “time windows”
o E.g., if typical query asks for 1 day of data, choose a time window of 1 day
▪ Divide sstables into time buckets, according to time window
▪ Compact using Size-Tiered strategy inside each time bucket
o If the 2-day old window has just one big sstable and a repair creates an
additional tiny “old” sstable, the two will not get compacted
o A tradeoff: slows read but avoids the write amplification problem of DTCS
▪ When the current time window ends, major compaction of bucket
o Except for small repair-produced sstables, we get 1 sstable per time window
43
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Summary
44
Workload Size-Tiered Leveled Incremental Time-Window
Write-only 2x peak space 2x writes Best -
Overwrite Huge peak
space
write
amplification
high peak
space, but not
like size-tiered
-
Read-mostly,
few updates
read
amplification
Best read
amplification
-
Read-mostly,
but a lot of
updates
read and space
amplification
write
amplification
may overwhelm
read
amplification
-
Time series write, read, and
space ampl.
write and space
amplification
write and read
amplification
Best
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
nyh@scylladb.com
Please stay in touch
Any questions?
For more information, see my blog posts:
1. https://www.scylladb.com/2018/01/17/compaction-series-space-a
mplification/ Space Amplification in Size-Tiered
Compaction
2. https://www.scylladb.com/2018/01/31/compaction-series-leveled-
compaction/ Write Amplification in Leveled Compaction
3. Soon to come, post about Incremental Compaction.
PRESENTATION TITLE ON ONE LINE
AND ON TWO LINES
First and last name
Position, company
Questions?
nyh@scylladb.com
Please stay in touch
Any questions?
Thank you for listening!

More Related Content

What's hot

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingScyllaDB
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowDremio Corporation
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...Dremio Corporation
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheDremio Corporation
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...Databricks
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseDatabricks
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtJon Su
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Chris Fregly
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsScyllaDB
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Databricks
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on ReadDatabricks
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 

What's hot (20)

Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit... Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
Near Real-Time Netflix Recommendations Using Apache Spark Streaming with Nit...
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Common Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta LakehouseCommon Strategies for Improving Performance on Your Delta Lakehouse
Common Strategies for Improving Performance on Your Delta Lakehouse
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureImproving Apache Spark by Taking Advantage of Disaggregated Architecture
Improving Apache Spark by Taking Advantage of Disaggregated Architecture
 
Survey of High Performance NoSQL Systems
Survey of High Performance NoSQL SystemsSurvey of High Performance NoSQL Systems
Survey of High Performance NoSQL Systems
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
Using Delta Lake to Transform a Legacy Apache Spark to Support Complex Update...
 
Delta: Building Merge on Read
Delta: Building Merge on ReadDelta: Building Merge on Read
Delta: Building Merge on Read
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 

Similar to Scylla Compaction Strategies

Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...ScyllaDB
 
Scylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking aheadScylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking aheadScyllaDB
 
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScyllaDB
 
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...ScyllaDB
 
If You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined TypesIf You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined TypesScyllaDB
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Eric Evans
 
Apache spark on planet scale
Apache spark on planet scaleApache spark on planet scale
Apache spark on planet scaleDenis Chapligin
 
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...ScyllaDB
 
Scaling MySQL in Amazon Web Services
Scaling MySQL in Amazon Web ServicesScaling MySQL in Amazon Web Services
Scaling MySQL in Amazon Web ServicesLaine Campbell
 
Db2 performance tuning for dummies
Db2 performance tuning for dummiesDb2 performance tuning for dummies
Db2 performance tuning for dummiesAngel Dueñas Neyra
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Dave Anselmi
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Community
 
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScyllaDB
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceMind The Firebird
 
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the FieldScylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the FieldScyllaDB
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupHolden Karau
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 
Scylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScyllaDB
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScyllaDB
 

Similar to Scylla Compaction Strategies (20)

Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
Scylla Summit 2017: How to Ruin Your Workload's Performance by Choosing the W...
 
Scylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking aheadScylla Summit 2017: Keynote, Looking back, looking ahead
Scylla Summit 2017: Keynote, Looking back, looking ahead
 
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data CenterScylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
Scylla Summit 2017: Intel Optane SSDs as the New Accelerator in Your Data Center
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend fo...
 
If You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined TypesIf You Care About Performance, Use User Defined Types
If You Care About Performance, Use User Defined Types
 
Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)Wikimedia Content API (Strangeloop)
Wikimedia Content API (Strangeloop)
 
Apache spark on planet scale
Apache spark on planet scaleApache spark on planet scale
Apache spark on planet scale
 
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
TechTalk: Reduce Your Storage Footprint with a Revolutionary New Compaction S...
 
Scaling MySQL in Amazon Web Services
Scaling MySQL in Amazon Web ServicesScaling MySQL in Amazon Web Services
Scaling MySQL in Amazon Web Services
 
Db2 performance tuning for dummies
Db2 performance tuning for dummiesDb2 performance tuning for dummies
Db2 performance tuning for dummies
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg
 
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at ZenlyScylla Summit 2017: From Elasticsearch to Scylla at Zenly
Scylla Summit 2017: From Elasticsearch to Scylla at Zenly
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performance
 
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the FieldScylla Summit 2017: A Toolbox for Understanding Scylla in the Field
Scylla Summit 2017: A Toolbox for Understanding Scylla in the Field
 
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark MeetupBeyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
Beyond Shuffling and Streaming Preview - Salt Lake City Spark Meetup
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Scylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum PerformanceScylla Summit 2017: Planning Your Queries for Maximum Performance
Scylla Summit 2017: Planning Your Queries for Maximum Performance
 
Scylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of ScyllaScylla Summit 2018: Keynote - 4 Years of Scylla
Scylla Summit 2018: Keynote - 4 Years of Scylla
 

Recently uploaded

What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 

Recently uploaded (20)

What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 

Scylla Compaction Strategies

  • 1. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company SCYLLA’S COMPACTION STRATEGIES “Big Things” meetup @ Oracle, March 2019. Nadav Har’El,
  • 2. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Presenter bio 2 Nadav Har’El has had a diverse 20-year career in computer programming and computer science. He worked on high-performance scientific computing, networking software, information retrieval, virtualization and operating systems. Today he works on Scylla.
  • 3. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Agenda ▪ What is compaction? ▪ Scylla’s compaction strategies: o Size Tier o Leveled o Incremental o Date Tier o Time Window ▪ Which should I use for my workload and why? ▪ Examples 3
  • 4. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company What is compaction? Scylla’s write path: 4 Writes commit log compaction
  • 5. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (What is compaction?) ▪ Scylla’s write path: o Updates are inserted into a memory table (“memtable”) o Memtables are periodically flushed to a new sorted file (“sstable”) ▪ After a while, we have many separate sstables o Different sstables may contain old and new values of the same cell o Or different rows in the same partition o Wastes disk space o Slows down reads ▪ Compaction: read several sstables and output one (or more) containing the merged and most recent information 5
  • 6. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company What is compaction? (cont.) ▪ This technique of keeping sorted files and merging them is well-known and often called Log-Structured Merge (LSM) Tree ▪ Published in 1996 (but differs from modern implementations). ▪ Earliest popular application that I know of is the Lucene search engine, 1999 o High performance write. o Immediately readable. o Reasonable performance for read. 6
  • 7. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Compaction Strategy ▪ Which sstables to compact, and when? ▪ This is called the compaction strategy ▪ The goal of the strategy is low amplification: o Avoid compacting the same data again and again. • write amplification o Avoid read requests needing many sstables. • read amplification o Avoid overwritten/deleted/expired data staying on disk. o Avoid excessive temporary disk space needs. • space amplification 7 Which compaction strategy shall I choose?
  • 8. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #1: Size-Tiered Compaction ▪ Cassandra’s oldest, and still default, compaction strategy. ▪ The default on Scylla too. ▪ Dates back to Google’s BigTable paper (2006). o Idea used even earlier (e.g., Lucene, 1999). 8
  • 9. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction strategy 9
  • 10. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Size-Tiered compaction strategy) ▪ Each time when enough data is in the memory table, flush it to a small sstable ▪ When several small sstables exist, compact them into one bigger sstable ▪ When several bigger sstables exist, compact them into one very big sstable ▪ … ▪ Each time one “size tier” has enough sstables, compact them into one sstable in the (usually) next size tier 10
  • 11. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification ▪ write amplification: O(logN) o Most data is in highest tier - needed to pass through O(logN) tiers • Where “N” is (data size) / (flushed sstable size). o This is asymptotically optimal 11
  • 12. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification Read amplification? O(logN) sstables, but: ▪ If workload writes a partition once and never modifies it: o Eventually each partition’s data will be compacted into one sstable o In-memory bloom filter will usually allow reading only one sstable o Optimal ▪ But if workload continues to update the same partition: o All sstables will contain updates to the same partition o O(logN) reads per read request o Reasonable, but not great 12
  • 13. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification ▪ Space amplification 13 “storage space is most often the primary bottleneck when using Flash SSDs under typical production workloads at Facebook” Facebook researchers, 2017. http://cidrdb.org/cidr2017/papers/p82-dong -cidr17.pdf
  • 14. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification Space amplification: ▪ Obsolete data in a huge sstable will remain for a very long time. ▪ Compaction needs a lot of temporary space: o Worst-case, needs to merge all existing sstables into one and may need half the disk to be empty for the merged result. (2x) o Less of a problem in Scylla than Cassandra because of sharding. o Still a risk - so best practice is to leave half the disk empty... 14
  • 15. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Size-Tiered compaction - amplification Space amplification (continued): ▪ When workload is overwrite-intensive, it is even worse: o Cassandra waits until 4 large sstables. o All 4 overwrote the same data, so merged amount is same as in 1 sstable o 5-fold space amplification! o Even worse: same data in several tiers. ▪ Scylla reduces this space amplification vs. Cassandra: o Wait for only two sstables of similar size! o May wait for more depending on workload (automatic controller) 15
  • 16. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #2: Leveled Compaction ▪ Introduced in Cassandra 1.0, in 2011. ▪ Based on Google’s LevelDB (itself based on Google’s BigTable) ▪ Aims to reduce space and read amplifications. ▪ Replaces huge SSTables with runs: o A run is a big SSTable split into small (160 MB by default) SSTables o These have non-overlapping key ranges o A huge SSTable must be rewritten as a whole, but in a run we can modify only parts of it (individual sstables) while keeping the disjoint key requirement ▪ In leveled compaction strategy: 16
  • 17. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction strategy 17 Level 0 Level 1 (run of 10 sstables) Level 2 (run of 100 sstables) ...
  • 18. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Leveled compaction strategy) ▪ SSTables are divided into “levels”: o New SSTables (dumped from memtables) are created in Level 0 o Each other level is a run of SSTables of exponentially increasing size: • Level 1 is a run of 10 SSTables (of 160 MB each) • Level 2 is a run of 100 SSTables (of 160 MB each) • etc. ▪ When we have enough (e.g., 4) sstables in Level 0, we compact them with all 10 sstables in Level 1 o We don't create one large sstable - rather, a run: we write one sstable and when we reach the size limit (160 MB), we start a new sstable 18
  • 19. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Leveled compaction strategy) ▪ After the compaction of level 0 into level 1, level 1 may have more than 10 of sstables. We pick one and compact it into level 2: o Take one sstable from level 1 o Look at its key range and find all sstables in level 2 which overlap with it o Typically, there are about 12 of these • The level 1 sstable spans roughly 1/10th of the keys, while each level 2 sstable spans 1/100th of the keys; so a level-1 sstable’s range roughly overlaps 10 level-2 sstables plus two more on the edges o As before, we compact the one sstable from level 1 and the 12 sstables from level 2 and replace all of those with new sstables in level 2 19
  • 20. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company (Leveled compaction strategy) ▪ After this compaction of level 1 into level 2, now we can have excess sstables in level 2 so we merge them into level 3. Again, one sstable from level 2 will need to be compacted with around 10 sstables from level 3. 20
  • 21. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Space amplification: o Because of sstable counts, 90% of the data is in the deepest level (if full!) o These sstables do not overlap, so it can’t have duplicate data! o So at most, 10% of the space is wasted • (Actually, today, 50%, if deepest level is not full. Can be improved.) o Also, each compaction needs a constant (~12*160MB) temporary space o Nearly optimal 21
  • 22. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Read amplification: o We have O(N) tables! o But in each level sstables have disjoint ranges (cached in memory) o Worst-case, O(logN) sstables relevant to a partition - plus L0 size. o Under some assumptions 10% space amplification implies: 90% of the reads will need just one sstable! o Nearly optimal 22
  • 23. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Write amplification: 23
  • 24. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - amplification ▪ Write amplification: o Also O(logN) steps like size-tiered: • Again, most of the data is in the deepest level k • All data was written once in L0, then compacted into L1, …, Lk o But... significantly more writes in each step! • STCS: reads sstables of total size X, writes X (if no overwrites) • LCS: one input sstable size X (Li) - merged with ~10X (Li+1) write ~ 11X. o If enough writing and LCS can’t keep up, its read and space advantages are lost o If also have cache-miss reads, they will get less disk bandwidth 24
  • 25. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Leveled compaction - write amplification ▪ Leveled compactions write amplification is not only a problem with 100% write... ▪ Can have just 10% writes and an amplified write workload so high that o Uncached reads slowed down because we need the disk to write o Compaction can’t keep up, uncompacted sstables pile up, even slower reads ▪ Leveled compaction is unsuitable for many workloads with a non-negligible amount of writes even if they seem “read mostly” 25
  • 26. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 - write-only workload ▪ Write-only workload o Cassandra-stress writing 30 million partitions (about 9 GB of data) o Constant write rate 10,000 writes/second o One shard 26
  • 27. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 (space amplification) constant multiple of flushed memtable & sstable size 27 x2 space amplification
  • 28. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 (write amplification) ▪ Amount of actual data collected: 8.8 GB ▪ Size-tiered compaction: 50 GB writes (4 tiers + commit log) ▪ Leveled compaction: 111 GB writes 28
  • 29. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1 - conclusions ▪ Size-tiered compaction: at some points needs twice the disk space o In Scylla with many shards, “usually” maximum space use is not concurrent ▪ Level-tiered compaction: more than double the amount of disk I/O o Test used smaller-than default sstables (10 MB) to illustrate the problem o Same problem with default sstable size (160 MB) - with larger workloads 29
  • 30. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Can we create a new compaction strategy with ▪ Low write amplification of size-tiered compaction, ▪ Without its high temporary disk space usage during compaction? 30
  • 31. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #3: Incremental Compaction ▪ New in Scylla Enterprise 2019.1 ▪ Improved Size-Tiered strategy, with ideas from Leveled strategy: 31
  • 32. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #3: Incremental Compaction ▪ Size-tiered compaction needs temporary space because we only remove a huge sstable after we fully compact it. ▪ Let’s split each huge sstable into a run (as in LCS): o Treat the entire run (not individual sstables) as a file for STCS o Remove individual sstables as compacted. Low temporary space. 32
  • 33. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Incremental compaction - amplification ▪ Space amplification: o Small constant temporary space needs, even smaller than LCS (M*S per parallel compaction, e.g., M=2, S=160 MB) o Overwrite-mostly still a worst-case - but 2-fold, not 5-fold as Cassandra. ▪ Write amplification: o Same as size-tiered: O(logN), small constant ▪ Read amplification: o Same as size-tiered: at worst O(logN) if updating the same partitions 33
  • 34. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 1, with incremental compaction 34
  • 35. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 2 - overwrite workload ▪ Write 15 times the same 4 million partitions o cassandra-stress write n=4000000 -pop seq=1..4000000 -schema "replication(strategy=org.apache.cassandra.locator.SimpleStrategy,factor=1)" o In this test cassandra-stress not rate limited o Again, small (10MB) LCS tables ▪ Necessary amount of sstable data: 1.2 GB ▪ STCS space amplification: x7.7 (with Cassandra’s default of 4) ▪ LCS space amplification lower, constant multiple of sstable size ▪ ICS and Scylla’s default of 2, around x2 35
  • 36. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 2 36 x7.7 space amplification
  • 37. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 3 - read+updates workload ▪ When workloads are read-mostly, read amplification is important ▪ When workloads also have updates to existing partitions o With STCS, each partition ends up in multiple sstables o Read amplification ▪ An example to simulate this: o Do a write-only update workload • cassandra_stress write n=4,000,000 -pop seq=1..1,000,000 o Now run a read-only workload • cassandra_stress read n=1,000,000 -pop seq=1..1,000,000 • measure avg. number of read bytes per request 37
  • 38. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 3 - read+updates workload ▪ Size-tiered: 46,915 bytes read per request o Optimal after major compaction - 11,979 ▪ Leveled: 11,982 o Equal to optimal because in this case all sstables fit in L1... ▪ Increasing the number of partitions 8-fold: o Size-tiered: 29,794 luckier this time o Leveled: 16,713 unlucky (0.5 of data, not 0.9, in L2) ▪ BUT: Remember that if we have non-negligable amount of writes, LCS write amplification may slow down reads 38
  • 39. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Example 3, and major compaction ▪ We saw that size-tiered major compaction reduces read amplification ▪ It also reduces space amplification (expired/overwritten data) ▪ Major compaction only makes sense if very few writes o But in that case, LCS’s write amplification is not a problem! o So LCS is recommended instead of major compaction • Easier to use • No huge operations like major compaction (need to find when to run) • No 50%-free-disk worst-case requirement • Good read amplification and space amplification 39
  • 40. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Strategy #4: Time-Window Compaction ▪ Introduced in Cassandra 3.0.8, designed for time-series data ▪ Replaces Date-Tiered compaction strategy of Cassandra 2.1 (which is also supported by Scylla, but not recommended) 40
  • 41. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Time-Window compaction strategy (cont.) In a time-series use case: ▪ Clustering key and write time are correlated ▪ Data is added in time order. Only few out-of-order writes, typically rearranged by just a few seconds ▪ Data is only deleted through expiration (TTL) or by deleting an entire partition, usually the same TTL on all the data ▪ A query is a clustering-key range query on a given partition Most common query: "values from the last hour/day/week" 41
  • 42. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Time-Window compaction strategy (cont.) ▪ Scylla remembers in memory the minimum and maximum clustering key in each newly-flushed sstable o Efficiently find only the sstables with data relevant to a query ▪ But most compaction strategies, o Destroy this feature by merging “old” and “new” sstables o Move all rows of a partition to the same sstable… • But time series queries don’t need all rows of a partition, just rows in a given time range • Makes it impossible to expire old sstable’s when everything in them has expired • Read and write amplification (needless compactions) 42
  • 43. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Time-Window compaction strategy (cont.) So TWCS: ▪ Divides time into “time windows” o E.g., if typical query asks for 1 day of data, choose a time window of 1 day ▪ Divide sstables into time buckets, according to time window ▪ Compact using Size-Tiered strategy inside each time bucket o If the 2-day old window has just one big sstable and a repair creates an additional tiny “old” sstable, the two will not get compacted o A tradeoff: slows read but avoids the write amplification problem of DTCS ▪ When the current time window ends, major compaction of bucket o Except for small repair-produced sstables, we get 1 sstable per time window 43
  • 44. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Summary 44 Workload Size-Tiered Leveled Incremental Time-Window Write-only 2x peak space 2x writes Best - Overwrite Huge peak space write amplification high peak space, but not like size-tiered - Read-mostly, few updates read amplification Best read amplification - Read-mostly, but a lot of updates read and space amplification write amplification may overwhelm read amplification - Time series write, read, and space ampl. write and space amplification write and read amplification Best
  • 45. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company nyh@scylladb.com Please stay in touch Any questions? For more information, see my blog posts: 1. https://www.scylladb.com/2018/01/17/compaction-series-space-a mplification/ Space Amplification in Size-Tiered Compaction 2. https://www.scylladb.com/2018/01/31/compaction-series-leveled- compaction/ Write Amplification in Leveled Compaction 3. Soon to come, post about Incremental Compaction.
  • 46. PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Questions? nyh@scylladb.com Please stay in touch Any questions? Thank you for listening!