SlideShare a Scribd company logo
1 of 93
Scylla 5.0 New
Features, Part 1
Avi Kivity, CTO; Eliran Sinvani, Software Team Leader; Botond
Dénes, Software Engineer; Tomasz Grabiec, Distinguished
Software Engineer; Kamil Braun, Software Engineer
I/O Scheduling
In ScyllaDB 5.0
Avi Kivity
CTO
Avi Kivity
■ Original maintainer of Linux KVM - Kernel-based Virtual Machine
■ Co-maintainer of Seastar, ScyllaDB
■ Co-founder of ScyllaDB
CTO
A database is a balancing act…
■ Your reads
■ Compaction
■ Repair/bootstrap/decommission
Why I/O Scheduling?
The spice bytes must flow
Read Queue
Compaction Queue
Maintenance Queue
Scheduler
Disk
Understanding disk performance
The new I/O Scheduler
■ Collect information about disks
■ Build a more accurate mathematical disk model
■ Embody the model into the I/O scheduler
Thank you!
Stay in touch
Avi Kivity
@AviKivity
avi@scylladb.com
ScyllaDB 5.0
Workload Specific
Optimizations
Eliran Sinvani
Software Team Leader
Eliran Sinvani
■ Core SW Team Leader at ScyllaDB for the past 3 years.
■ BSP and Embedded SW Team Leader at Airspan Networks for
over a year.
■ 3 years as Embedded SW engineer in the Cellular industry (both
UE and BS sides).
Software Team Leader
Dealing With Different Workloads
As the number of use cases supported by Scylla gets
bigger consistently, we sometimes encounter
conflicting requirements for different types of
workloads.
■ Parallelism
■ Mean latency or P99
■ Users priorities (different SLA requirements)
Workload Prioritization
The key for dealing with some differentiating aspects of workloads is provided in
our enterprise version and is called workload prioritization it provides several
benefits already:
■ Resource isolation of workloads (CPU and Memory)
■ Prioritization of workloads
■ Can balance to some extent between OLAP and OLTP workloads
OLTP and OLAP latencies with workload prioritization enabled.
Problem: Workload Isolation Is Not Enough :(
■ We have solved (to some extent) the problem of cross workload impact on
resource allocation.
■ But:
● Expressing the requirements is lacking:
● A quantitative description - we characterize workload by shares, the more shares a
workload gets relative to others the more important it is and more resources it will get.
● A lot of real world requirements can’t be expressed
● Relative description somewhat breaks the isolation concept - if there is an isolation,
what do I care about relation between workloads? (at least on some aspects)
● Some of Scylla’s configuration options and behaviours are global
● Timeouts
● Parallelism limitation (Botond’s talk: Improvements to the OOM resilience of reads on the replica
side)
Prioritization and isolation
is simply not enough
Example: A web server database with analytics
Scenario:
1. Main workload: we would like to present to a
user some information in response to a click on
a webpage.
2. Secondary workload: Periodically we would like
to run some DB wide analytics.
Example: A web server database with analytics cont.
Main workload:
1. Need to have at worst (timeout) tens to hundreds
of ms latency or the page will appear
unresponsive for some users.
2. Has high concurrency as requests are
independent.
Example: A web server database with analytics cont.
Secondary workload:
1. Needs to have as much throughput as possible
2. Has bounded and controllable concurrency.
(since it is originated at the same client/logic)
Example: A web server database with analytics cont.
The timeout dilemma:
1. We will need to set some timeout for the server side.
This timeout should follow:
2. For the main workload this can’t be too high, since it
will cause the interactive user either to retry (click
again and again) or to drop the request, both of which
will be a waste of resources or even worse, can be
experienced as unbounded concurrency by the server.
3. Can’t be too low as the analytics requests will fail,
since achieving high throughput will normally increase
latency since the queues are full.
Example: A web server database with analytics cont.
Overload response:
1. Interactive client (main workload) can’t be throttled
since the requests are unrelated, delaying response to
some application user A will not cause some other
user B to delay or stop sending requests (Unbounded
concurrency)
2. In order to control the batch workload better we would
like to throttle since this will allow us to have a knob
that controls the pace of the analytics workload.
(Bounded concurrency)
Examples Of Workload Characteristics
■ Latency distribution (some approximate desired histogram)
■ Timeout
■ Throughput / Latency orientation (ie. OLTP vs OLAP)
■ Expected parallelism
■ Burstiness
Benefits of Workload Characterization
■ Better cluster utilization, increased ability to serve multi workload scenarios
● Side effect: serving on the same cluster means less administrative overhead.
■ Resource usage efficiency and more correct resource distribution
■ Better overload handling
■ More accurate metrics (i.e the per workload timeout metric is reliable and no
need to check latency distribution to calculate timeouts).
■ Smarter alerting when requirements can’t be met
■ Better isolation capabilities (not necessarily relative)
■ Can serve as a base for elasticity application (i.e Grow to meet requirements)
The Service Level Mechanism
We have already implemented some capabilities using our service levels
mechanism which we have imported from our enterprise version and extended to
support more detailed workload configuration.
■ Service Level -
• Contains a workload characteristic (can be partial)
• Can be attached to a role
■ A connection workload characteristics are determined by merging all service
levels attached to the authenticated role and its parent roles.
Workload characterization
Ideally we would like scylla to do:
■ For main workload:
● Have low timeout (~30-100ms)
● Load shedding (fail excessive requests immediately), because the
server cannot cause the interactive workload to slow down.
● Dedicate most of the resources to this workload.
■ For secondary workload:
● Relatively high timeout (~10-30s)
● Throttling - delaying some responses to distribute the load over time
and control it.
● Use mostly unused resources (unused by the main workload), if we
have interactive workloads, it means that those will fluctuate and
not always be at their peak level, this means we have resources lying
around some of the time.
Workload characterization cont.
The ability to configure the workload characteristics
is already implemented in Scylla.
CREATE SERVICE LEVEL main WITH timeout = 30ms
AND workload_type=interactive AND shares = 800
CREATE SERVICE LEVEL secondary WITH timeout =
30s AND workload_type=batch AND shares = 200
Shares = xxxx is only available in Enterprise version.
Workload characterization
Ideally, we would like Scylla to do: “CREATE SERVICE LEVEL XXX WITH”:
■ For main workload:
■ For secondary workload:
Have low timeout (30ms) timeout=30ms
Load shedding (*) AND workload_type=interactive
Dedicate most of the resources to this
workload. (80% guaranteed resources) (**)
AND shares=800
Have relatively high timeout (30s) timeout=30s
Throteling AND workload_type=batch
Use mostly unused resources (only 20%
guaranteed resources) (**)
AND shares=200
* Implemented but still hasn’t been tested extensively.
** Enterprise only
Workload characterization future improvements
■ Overload and timeout behaviour according to
workload type (i.e shedding vs throttling) .
■ Auto tuning according to workload
characterization.
■ Workload specific metrics and alerts.
■ Workload specific behaviours:
● Bypass cache as a default for analytics
● Cache division or isolation according to prioritization
● Disallowing filtering queries.
■ More precise and elaborate configuration
options.
References
■ https://scylla.docs.scylladb.com/master/design-
notes/service_levels.html
■ https://docs.scylladb.com/using-
scylla/workload-prioritization/
Thank you!
Stay in touch
Eliran Sinvani
eliransin@scylladb.com
ScyllaDB 5.0
Improvements to
the OOM Resilience of
Reads on the Replica
Botond Dénes
Software Engineer
■ Working @ ScyllaDB since 2017
■ Member of the storage team
Botond Dénes
Software Engineer
The basic idea
■ The concurrency of reads on the replica is controlled
• To keep concurrency within a useful limit
• To avoid resource exhaustion, in particular: OOM
■ Implemented via a semaphore
■ Semaphore is dual limited by count and memory
■ Separate semaphores for scheduling groups
Recent work - much tracking, such buffers
■ Track I/O buffers as soon as they are allocated
(instead of when read completes)
■ Track buffers used for parsing sstable data
■ Track reader buffers
■ (still not 100% of all buffers is tracked)
Result: reader permit everywhere and vastly improved tracking accuracy
Recent work - addressing the usual suspects
■ Unpaged reads
■ Reverse reads
■ & variations (unpaged full scan -- true story)
Introduce a special (soft, hard) limit pair for
these reads
Recent work - semaphore in the front
memtable
reader
cache
reader
restricted
reader
combined
reader
combined
reader
sstable
reader 1
sstable
reader 2
sstable
reader N on
cache
miss
memtable
reader
cache
reader
combined
reader
combined
reader
sstable
reader 1
sstable
reader 2
sstable
reader N
Recent work - better diagnostics
■ Dump memory diagnostics on OOM
■ Dump semaphore diagnostics on queue overload/timeout
So where are we now?
■ Read induced OOM is actually quite rare(?) now
■ Doesn’t mean it's gone for good, there might be corner cases
Thank you!
Stay in touch
Botond Dénes
dns.botond@gmail.com
ScyllaDB 5.0
SSTable Index
Caching
Tomasz Grabiec
Distinguished Software Engineer
Tomasz Grabiec
■ Core engineer and maintainer at ScyllaDB for the past 8 years
■ Started coding when Commodore 64 was still a thing
■ Lives in Cracow, Poland
Distinguished Software Engineer
SSTable indexing - what’s new?
■ Automatic caching of SSTable indexes
■ Reads from disk got faster!
■ …especially for large partitions
SSTable indexing
SELECT … WHERE key > …
Data file:
Index:
SSTable indexing
Clustering key index
Partition key index
SSTable indexing
Partition key index
Partition key index
Summary
RAM
Disk
SSTable indexing
Partition key index
Summary
● Always loaded in RAM
● Decided when sstable
written to disk
● Separate file on disk
RAM
Disk
SSTable indexing
Partition key index
Summary
● 1:20k ratio of summary
size to data file size
● Trade-off between
memory footprint and
speed of reads
RAM
Disk
SSTable indexing - problem
■ Only summary
permanently cached
■ Reads typically* need to
touch the disk while
walking the index
■ Increases load on the
disk
■ Adds latency
* Partition index pages are shared among concurrent readers
RAM
Disk
SSTable indexing - problem
■ Large partition workloads
experienced diminishing
index caching as partition
size grows
■ Average amount of I/O
needed is:
O(log(partition_size))
SSTable indexing - new in 5.0
■ The whole of index can now
be cached in memory
■ Populated on access (read-
through)
■ Evicted on memory
pressure
■ Partition index summary
still non-evictable and
always resident
RAM
Disk
SSTable indexing
RAM
Disk
■ Reads for different rows
still share access to part
of the index
■ Caching the index
reduces amount of I/O
for future reads
SSTable indexing - large partition example
Partition size: 10 GB, Rows: 10 M, Index file size: 5 MB
I/O for a single row read, cold cache:
■ 2x 32 KB for partition index summary page read
■ 17x 4 KB for binary search in the clustering index read
■ 2x 32 KB for data file read
TOTAL: 196 KB, 21 I/O reqs, 20ms
SSTable indexing - large partition example
Partition size: 10 GB, Rows: 10 M, Index file size: 5 MB
I/O for a single row read, cold cache:
■ 2x 32 KB for partition index summary page read
■ 17x 4 KB for binary search in the clustering index read
■ 2x 32 KB for data file read
TOTAL: 196 KB, 21 I/O reqs, 20ms
TOTAL: 64KB, 2 I/O reqs, 0.2ms
hot
SSTable indexing - large partition example
Partition size: 10 GB, Rows: 10 M, Index file size: 5 MB
scylla-5.0 -c1 -m4G
scylla-bench -workload uniform -mode read -limit 1 -concurrency 100 -partition-count 1 
-clustering-row-count 10000000 -duration 60m
Before: 2’011 Rows/s
After: 6’191Rows/s
(the node was bound by disk bandwidth, ~530 MB/s)
■ Populated on index file
access (read-through cache)
■ Granularity: 4KB chunk of an
index file
■ Idea similar to the page cache
in Linux
RAM
Disk
Index file
SSTable indexing - Index file page cache
4K
SSTable indexing - Index file page cache
■ Clustering key index is
cached by the means of the
index file page cache
■ On-disk representation is
random-access so no need to
keep parsed entries
RAM
Disk
Clustering key
index
SSTable indexing
Partition key index
Summary
Partition key index
page
SSTable indexing - partition index page cache
■ Granularity: partition index
summary page
■ Contains parsed index pages for
fast lookup (on-disk
representation is not random-
access)
■ Saves CPU time compared to
having just the index file page
cache
RAM
Disk
■ No tunables, caches use all available free space
■ Multiple caches compete for space:
● row cache
● sstable partition index page cache
● sstable index file page cache
■ Fair eviction, single LRU for all caches
● E.g. no reads from disk => row cache uses all the free space
● Not optimal for all workloads
SSTable index caching
Thank you!
Stay in touch
Tomasz Grabiec
@tgrabiec
tgrabiec@scylladb.com
ScyllaDB 5.0
Improved Reversed
Queries
Kamil Braun
Software Engineer
Kamil Braun
■ Software engineer working on Scylla
■ Passionate about distributed systems, functional programming,
and formal methods in software development
■ Graduated from the University of Warsaw with a MSc in
Computer Science and BSc in Mathematics
Software Engineer
What are reversed queries?
CREATE TABLE ks.t (
pk int,
ck int,
v int,
PRIMARY KEY (pk, ck)
) WITH CLUSTERING ORDER BY (ck ASC)
Reversed query:
SELECT * FROM ks.t WHERE pk = 0 ORDER BY ck DESC;
CREATE TABLE ks.t (
pk int,
ck int,
v int,
PRIMARY KEY (pk, ck)
) WITH CLUSTERING ORDER BY (ck ASC)
SELECT * FROM ks.t WHERE pk = 0
(or SELECT * FROM ks.t WHERE pk = 0 ORDER BY ck ASC):
pk | ck | v
----+----+---
0 | 0 | 0
0 | 1 | 2
0 | 2 | 3
CREATE TABLE ks.t (
pk int,
ck int,
v int,
PRIMARY KEY (pk, ck)
) WITH CLUSTERING ORDER BY (ck ASC)
SELECT * FROM ks.t WHERE pk = 0 ORDER BY ck DESC:
pk | ck | v
----+----+---
0 | 2 | 3
0 | 1 | 2
0 | 0 | 0
Before
Query range: [6, 16]
E.g.: SELECT * from ks.t WHERE pk = 0 AND ck >= 6 AND ck <= 16;
Disk
Memory
sstable 6 16
Query range: [6, 16]
Disk
Memory
sstable 6 16
6 16
Read to memory
Query range: [6, 16]
Disk
Memory
sstable 6 16
6 16 16
Reverse
First page
13 13
Return
Query range: [6, 16]
Disk
Memory
sstable 6 16
6 16 16
Reverse
First page
13 13
Return
Wasted work
Query range: [6, 12]
Disk
Memory
sstable 6 12
6 9 12
Reverse
Second page
12 9
Return
Read to memory
Wasted work
Problem 1: quadratic complexity
. . .
Problem 1: quadratic complexity
N pages: N + (N-1) + (N-2) + … + 1 = O(N^2) pages read
. . .
Problem 2: huge memory consumption
To read a single page from the range, we need to fetch the entire range into
memory.
It may not even fit in memory, causing the read to fail.
After
Query range: [6, 16]
Disk
Memory
sstable 14
Index
I know where 14 is
Query range: [6, 16]
Disk
Memory
sstable 14
14
Read to memory
Query range: [6, 16]
Disk
Memory
sstable 14
14 16
15
16
15
Query range: [6, 16]
Disk
Memory
sstable 14
14 16
15
16
15
Query range: [6, 16]
Disk
Memory
sstable 14
14 16
15
16
15
Read to memory
Query range: [6, 16]
Disk
Memory
sstable 14
14 16
15
16
15
Where does row 13 start?
Query range: [6, 16]
Disk
Memory
sstable 14
14 16
15
16
15
Row metadata:
previous row size
13
13
Query range: [6, 16]
Disk
Memory
sstable 14
14 16
15
16
15
13
13
12
12
Query range: [6, 16]
Disk
Memory
sstable 14
14 16
15
16
15
13
13
12
12
Reverse page
14 12
13
15
16
Return page
New implementation:
■ Linear complexity
■ Memory consumption is O(page size)
New implementation:
■ Linear complexity
■ Memory consumption is O(page size)
Reversed reads from memtables were also improved.
New implementation:
■ Linear complexity
■ Memory consumption is O(page size)
Reversed reads from memtables were also improved.
Caveat: reversed queries are allowed for single-partition queries only.
■ Querying different partitions of sizes 10MB, 15MB, …, 115MB, 110MB
■ Forward and reversed queries
■ Scylla 4.5 branch vs master branch (as of 30.12.2021)
Comparison
Schema: pk int, ck int, v text, primary key (pk, ck)
Comparison
Schema: pk int, ck int, v text, primary key (pk, ck)
Query:
Comparison
SELECT * FROM ks.t WHERE pk = ?
{ORDER BY ck DESC}
BYPASS CACHE
{LIMIT 1000}
■ Reversed queries in Scylla <= 4.5:
● time complexity quadratic w.r.t size of queried range
● memory consumption linear w.r.t size of queried range
■ mc sstable format allows a better implementation
■ Reversed queries in upcoming release:
● time complexity linear w.r.t size of queried range
● memory consumption is linear w.r.t page size
Summary
Thank you!
Stay in touch
Kamil Braun
kbraun@scylladb.com

More Related Content

What's hot

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephScyllaDB
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with ScyllaScyllaDB
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2ScyllaDB
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Getting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterGetting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterScyllaDB
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafkaconfluent
 
Running MariaDB in multiple data centers
Running MariaDB in multiple data centersRunning MariaDB in multiple data centers
Running MariaDB in multiple data centersMariaDB plc
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraScyllaDB
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance TuningLars Hofhansl
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 

What's hot (20)

Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2Scylla Summit 2022: Scylla 5.0 New Features, Part 2
Scylla Summit 2022: Scylla 5.0 New Features, Part 2
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Getting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers FasterGetting the Scylla Shard-Aware Drivers Faster
Getting the Scylla Shard-Aware Drivers Faster
 
Disaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache KafkaDisaster Recovery Plans for Apache Kafka
Disaster Recovery Plans for Apache Kafka
 
Running MariaDB in multiple data centers
Running MariaDB in multiple data centersRunning MariaDB in multiple data centers
Running MariaDB in multiple data centers
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Lightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache CassandraLightweight Transactions in Scylla versus Apache Cassandra
Lightweight Transactions in Scylla versus Apache Cassandra
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 

Similar to Scylla Summit 2022: Scylla 5.0 New Features, Part 1

How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintScyllaDB
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
How to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadHow to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadScyllaDB
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptxMichael Ming Lei
 
Functional? Reactive? Why?
Functional? Reactive? Why?Functional? Reactive? Why?
Functional? Reactive? Why?Aleksandr Tavgen
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the fieldJoAnna Cheshire
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP PerformanceBIOVIA
 
Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013Ivan Sanders
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsJoel Oleson
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive WritesLiran Zelkha
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsKeeyong Han
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...BI Brainz
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmNigel Price
 
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Continuent
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward
 
Towards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptxTowards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptxPo-Chuan Chen
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftJie Li
 

Similar to Scylla Summit 2022: Scylla 5.0 New Features, Part 1 (20)

How Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter FootprintHow Workload Prioritization Reduces Your Datacenter Footprint
How Workload Prioritization Reduces Your Datacenter Footprint
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
MongoDB
MongoDBMongoDB
MongoDB
 
How to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another WorkloadHow to Meet Your P99 Goal While Overcommitting Another Workload
How to Meet Your P99 Goal While Overcommitting Another Workload
 
Performance tuning in sql server
Performance tuning in sql serverPerformance tuning in sql server
Performance tuning in sql server
 
adap-stability-202310.pptx
adap-stability-202310.pptxadap-stability-202310.pptx
adap-stability-202310.pptx
 
Functional? Reactive? Why?
Functional? Reactive? Why?Functional? Reactive? Why?
Functional? Reactive? Why?
 
Sql server tips from the field
Sql server tips from the fieldSql server tips from the field
Sql server tips from the field
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013Real world business workflow with SharePoint designer 2013
Real world business workflow with SharePoint designer 2013
 
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint DeploymentsLarge Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Handling Massive Writes
Handling Massive WritesHandling Massive Writes
Handling Massive Writes
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
Analysing and Troubleshooting Performance Issues in SAP BusinessObjects BI Re...
 
How_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_FarmHow_To_Soup_Up_Your_Farm
How_To_Soup_Up_Your_Farm
 
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
Webinar Slides: High Noon at AWS — Amazon RDS vs. Tungsten Clustering with My...
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
 
Towards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptxTowards User-Defined SLA in Cloud Flash Storage.pptx
Towards User-Defined SLA in Cloud Flash Storage.pptx
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon RedshiftPowering Interactive Data Analysis at Pinterest by Amazon Redshift
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Scylla Summit 2022: Scylla 5.0 New Features, Part 1

  • 1. Scylla 5.0 New Features, Part 1 Avi Kivity, CTO; Eliran Sinvani, Software Team Leader; Botond Dénes, Software Engineer; Tomasz Grabiec, Distinguished Software Engineer; Kamil Braun, Software Engineer
  • 2. I/O Scheduling In ScyllaDB 5.0 Avi Kivity CTO
  • 3. Avi Kivity ■ Original maintainer of Linux KVM - Kernel-based Virtual Machine ■ Co-maintainer of Seastar, ScyllaDB ■ Co-founder of ScyllaDB CTO
  • 4. A database is a balancing act… ■ Your reads ■ Compaction ■ Repair/bootstrap/decommission Why I/O Scheduling?
  • 5. The spice bytes must flow Read Queue Compaction Queue Maintenance Queue Scheduler Disk
  • 7. The new I/O Scheduler ■ Collect information about disks ■ Build a more accurate mathematical disk model ■ Embody the model into the I/O scheduler
  • 8. Thank you! Stay in touch Avi Kivity @AviKivity avi@scylladb.com
  • 10. Eliran Sinvani ■ Core SW Team Leader at ScyllaDB for the past 3 years. ■ BSP and Embedded SW Team Leader at Airspan Networks for over a year. ■ 3 years as Embedded SW engineer in the Cellular industry (both UE and BS sides). Software Team Leader
  • 11. Dealing With Different Workloads As the number of use cases supported by Scylla gets bigger consistently, we sometimes encounter conflicting requirements for different types of workloads. ■ Parallelism ■ Mean latency or P99 ■ Users priorities (different SLA requirements)
  • 12. Workload Prioritization The key for dealing with some differentiating aspects of workloads is provided in our enterprise version and is called workload prioritization it provides several benefits already: ■ Resource isolation of workloads (CPU and Memory) ■ Prioritization of workloads ■ Can balance to some extent between OLAP and OLTP workloads
  • 13. OLTP and OLAP latencies with workload prioritization enabled.
  • 14. Problem: Workload Isolation Is Not Enough :( ■ We have solved (to some extent) the problem of cross workload impact on resource allocation. ■ But: ● Expressing the requirements is lacking: ● A quantitative description - we characterize workload by shares, the more shares a workload gets relative to others the more important it is and more resources it will get. ● A lot of real world requirements can’t be expressed ● Relative description somewhat breaks the isolation concept - if there is an isolation, what do I care about relation between workloads? (at least on some aspects) ● Some of Scylla’s configuration options and behaviours are global ● Timeouts ● Parallelism limitation (Botond’s talk: Improvements to the OOM resilience of reads on the replica side)
  • 15. Prioritization and isolation is simply not enough
  • 16. Example: A web server database with analytics Scenario: 1. Main workload: we would like to present to a user some information in response to a click on a webpage. 2. Secondary workload: Periodically we would like to run some DB wide analytics.
  • 17. Example: A web server database with analytics cont. Main workload: 1. Need to have at worst (timeout) tens to hundreds of ms latency or the page will appear unresponsive for some users. 2. Has high concurrency as requests are independent.
  • 18. Example: A web server database with analytics cont. Secondary workload: 1. Needs to have as much throughput as possible 2. Has bounded and controllable concurrency. (since it is originated at the same client/logic)
  • 19. Example: A web server database with analytics cont. The timeout dilemma: 1. We will need to set some timeout for the server side. This timeout should follow: 2. For the main workload this can’t be too high, since it will cause the interactive user either to retry (click again and again) or to drop the request, both of which will be a waste of resources or even worse, can be experienced as unbounded concurrency by the server. 3. Can’t be too low as the analytics requests will fail, since achieving high throughput will normally increase latency since the queues are full.
  • 20. Example: A web server database with analytics cont. Overload response: 1. Interactive client (main workload) can’t be throttled since the requests are unrelated, delaying response to some application user A will not cause some other user B to delay or stop sending requests (Unbounded concurrency) 2. In order to control the batch workload better we would like to throttle since this will allow us to have a knob that controls the pace of the analytics workload. (Bounded concurrency)
  • 21. Examples Of Workload Characteristics ■ Latency distribution (some approximate desired histogram) ■ Timeout ■ Throughput / Latency orientation (ie. OLTP vs OLAP) ■ Expected parallelism ■ Burstiness
  • 22. Benefits of Workload Characterization ■ Better cluster utilization, increased ability to serve multi workload scenarios ● Side effect: serving on the same cluster means less administrative overhead. ■ Resource usage efficiency and more correct resource distribution ■ Better overload handling ■ More accurate metrics (i.e the per workload timeout metric is reliable and no need to check latency distribution to calculate timeouts). ■ Smarter alerting when requirements can’t be met ■ Better isolation capabilities (not necessarily relative) ■ Can serve as a base for elasticity application (i.e Grow to meet requirements)
  • 23. The Service Level Mechanism We have already implemented some capabilities using our service levels mechanism which we have imported from our enterprise version and extended to support more detailed workload configuration. ■ Service Level - • Contains a workload characteristic (can be partial) • Can be attached to a role ■ A connection workload characteristics are determined by merging all service levels attached to the authenticated role and its parent roles.
  • 24. Workload characterization Ideally we would like scylla to do: ■ For main workload: ● Have low timeout (~30-100ms) ● Load shedding (fail excessive requests immediately), because the server cannot cause the interactive workload to slow down. ● Dedicate most of the resources to this workload. ■ For secondary workload: ● Relatively high timeout (~10-30s) ● Throttling - delaying some responses to distribute the load over time and control it. ● Use mostly unused resources (unused by the main workload), if we have interactive workloads, it means that those will fluctuate and not always be at their peak level, this means we have resources lying around some of the time.
  • 25. Workload characterization cont. The ability to configure the workload characteristics is already implemented in Scylla. CREATE SERVICE LEVEL main WITH timeout = 30ms AND workload_type=interactive AND shares = 800 CREATE SERVICE LEVEL secondary WITH timeout = 30s AND workload_type=batch AND shares = 200 Shares = xxxx is only available in Enterprise version.
  • 26. Workload characterization Ideally, we would like Scylla to do: “CREATE SERVICE LEVEL XXX WITH”: ■ For main workload: ■ For secondary workload: Have low timeout (30ms) timeout=30ms Load shedding (*) AND workload_type=interactive Dedicate most of the resources to this workload. (80% guaranteed resources) (**) AND shares=800 Have relatively high timeout (30s) timeout=30s Throteling AND workload_type=batch Use mostly unused resources (only 20% guaranteed resources) (**) AND shares=200 * Implemented but still hasn’t been tested extensively. ** Enterprise only
  • 27. Workload characterization future improvements ■ Overload and timeout behaviour according to workload type (i.e shedding vs throttling) . ■ Auto tuning according to workload characterization. ■ Workload specific metrics and alerts. ■ Workload specific behaviours: ● Bypass cache as a default for analytics ● Cache division or isolation according to prioritization ● Disallowing filtering queries. ■ More precise and elaborate configuration options.
  • 29. Thank you! Stay in touch Eliran Sinvani eliransin@scylladb.com
  • 30. ScyllaDB 5.0 Improvements to the OOM Resilience of Reads on the Replica Botond Dénes Software Engineer
  • 31. ■ Working @ ScyllaDB since 2017 ■ Member of the storage team Botond Dénes Software Engineer
  • 32. The basic idea ■ The concurrency of reads on the replica is controlled • To keep concurrency within a useful limit • To avoid resource exhaustion, in particular: OOM ■ Implemented via a semaphore ■ Semaphore is dual limited by count and memory ■ Separate semaphores for scheduling groups
  • 33. Recent work - much tracking, such buffers ■ Track I/O buffers as soon as they are allocated (instead of when read completes) ■ Track buffers used for parsing sstable data ■ Track reader buffers ■ (still not 100% of all buffers is tracked) Result: reader permit everywhere and vastly improved tracking accuracy
  • 34. Recent work - addressing the usual suspects ■ Unpaged reads ■ Reverse reads ■ & variations (unpaged full scan -- true story) Introduce a special (soft, hard) limit pair for these reads
  • 35. Recent work - semaphore in the front memtable reader cache reader restricted reader combined reader combined reader sstable reader 1 sstable reader 2 sstable reader N on cache miss memtable reader cache reader combined reader combined reader sstable reader 1 sstable reader 2 sstable reader N
  • 36. Recent work - better diagnostics ■ Dump memory diagnostics on OOM ■ Dump semaphore diagnostics on queue overload/timeout
  • 37. So where are we now? ■ Read induced OOM is actually quite rare(?) now ■ Doesn’t mean it's gone for good, there might be corner cases
  • 38. Thank you! Stay in touch Botond Dénes dns.botond@gmail.com
  • 39. ScyllaDB 5.0 SSTable Index Caching Tomasz Grabiec Distinguished Software Engineer
  • 40. Tomasz Grabiec ■ Core engineer and maintainer at ScyllaDB for the past 8 years ■ Started coding when Commodore 64 was still a thing ■ Lives in Cracow, Poland Distinguished Software Engineer
  • 41. SSTable indexing - what’s new? ■ Automatic caching of SSTable indexes ■ Reads from disk got faster! ■ …especially for large partitions
  • 42. SSTable indexing SELECT … WHERE key > … Data file: Index:
  • 43. SSTable indexing Clustering key index Partition key index
  • 44. SSTable indexing Partition key index Partition key index Summary
  • 45. RAM Disk SSTable indexing Partition key index Summary ● Always loaded in RAM ● Decided when sstable written to disk ● Separate file on disk
  • 46. RAM Disk SSTable indexing Partition key index Summary ● 1:20k ratio of summary size to data file size ● Trade-off between memory footprint and speed of reads
  • 47. RAM Disk SSTable indexing - problem ■ Only summary permanently cached ■ Reads typically* need to touch the disk while walking the index ■ Increases load on the disk ■ Adds latency * Partition index pages are shared among concurrent readers
  • 48. RAM Disk SSTable indexing - problem ■ Large partition workloads experienced diminishing index caching as partition size grows ■ Average amount of I/O needed is: O(log(partition_size))
  • 49. SSTable indexing - new in 5.0 ■ The whole of index can now be cached in memory ■ Populated on access (read- through) ■ Evicted on memory pressure ■ Partition index summary still non-evictable and always resident RAM Disk
  • 50. SSTable indexing RAM Disk ■ Reads for different rows still share access to part of the index ■ Caching the index reduces amount of I/O for future reads
  • 51. SSTable indexing - large partition example Partition size: 10 GB, Rows: 10 M, Index file size: 5 MB I/O for a single row read, cold cache: ■ 2x 32 KB for partition index summary page read ■ 17x 4 KB for binary search in the clustering index read ■ 2x 32 KB for data file read TOTAL: 196 KB, 21 I/O reqs, 20ms
  • 52. SSTable indexing - large partition example Partition size: 10 GB, Rows: 10 M, Index file size: 5 MB I/O for a single row read, cold cache: ■ 2x 32 KB for partition index summary page read ■ 17x 4 KB for binary search in the clustering index read ■ 2x 32 KB for data file read TOTAL: 196 KB, 21 I/O reqs, 20ms TOTAL: 64KB, 2 I/O reqs, 0.2ms hot
  • 53. SSTable indexing - large partition example Partition size: 10 GB, Rows: 10 M, Index file size: 5 MB scylla-5.0 -c1 -m4G scylla-bench -workload uniform -mode read -limit 1 -concurrency 100 -partition-count 1 -clustering-row-count 10000000 -duration 60m Before: 2’011 Rows/s After: 6’191Rows/s (the node was bound by disk bandwidth, ~530 MB/s)
  • 54. ■ Populated on index file access (read-through cache) ■ Granularity: 4KB chunk of an index file ■ Idea similar to the page cache in Linux RAM Disk Index file SSTable indexing - Index file page cache 4K
  • 55. SSTable indexing - Index file page cache ■ Clustering key index is cached by the means of the index file page cache ■ On-disk representation is random-access so no need to keep parsed entries RAM Disk Clustering key index
  • 56. SSTable indexing Partition key index Summary Partition key index page
  • 57. SSTable indexing - partition index page cache ■ Granularity: partition index summary page ■ Contains parsed index pages for fast lookup (on-disk representation is not random- access) ■ Saves CPU time compared to having just the index file page cache RAM Disk
  • 58. ■ No tunables, caches use all available free space ■ Multiple caches compete for space: ● row cache ● sstable partition index page cache ● sstable index file page cache ■ Fair eviction, single LRU for all caches ● E.g. no reads from disk => row cache uses all the free space ● Not optimal for all workloads SSTable index caching
  • 59. Thank you! Stay in touch Tomasz Grabiec @tgrabiec tgrabiec@scylladb.com
  • 61. Kamil Braun ■ Software engineer working on Scylla ■ Passionate about distributed systems, functional programming, and formal methods in software development ■ Graduated from the University of Warsaw with a MSc in Computer Science and BSc in Mathematics Software Engineer
  • 62. What are reversed queries? CREATE TABLE ks.t ( pk int, ck int, v int, PRIMARY KEY (pk, ck) ) WITH CLUSTERING ORDER BY (ck ASC) Reversed query: SELECT * FROM ks.t WHERE pk = 0 ORDER BY ck DESC;
  • 63. CREATE TABLE ks.t ( pk int, ck int, v int, PRIMARY KEY (pk, ck) ) WITH CLUSTERING ORDER BY (ck ASC) SELECT * FROM ks.t WHERE pk = 0 (or SELECT * FROM ks.t WHERE pk = 0 ORDER BY ck ASC): pk | ck | v ----+----+--- 0 | 0 | 0 0 | 1 | 2 0 | 2 | 3
  • 64. CREATE TABLE ks.t ( pk int, ck int, v int, PRIMARY KEY (pk, ck) ) WITH CLUSTERING ORDER BY (ck ASC) SELECT * FROM ks.t WHERE pk = 0 ORDER BY ck DESC: pk | ck | v ----+----+--- 0 | 2 | 3 0 | 1 | 2 0 | 0 | 0
  • 66. Query range: [6, 16] E.g.: SELECT * from ks.t WHERE pk = 0 AND ck >= 6 AND ck <= 16; Disk Memory sstable 6 16
  • 67. Query range: [6, 16] Disk Memory sstable 6 16 6 16 Read to memory
  • 68. Query range: [6, 16] Disk Memory sstable 6 16 6 16 16 Reverse First page 13 13 Return
  • 69. Query range: [6, 16] Disk Memory sstable 6 16 6 16 16 Reverse First page 13 13 Return Wasted work
  • 70. Query range: [6, 12] Disk Memory sstable 6 12 6 9 12 Reverse Second page 12 9 Return Read to memory Wasted work
  • 71. Problem 1: quadratic complexity . . .
  • 72. Problem 1: quadratic complexity N pages: N + (N-1) + (N-2) + … + 1 = O(N^2) pages read . . .
  • 73. Problem 2: huge memory consumption To read a single page from the range, we need to fetch the entire range into memory. It may not even fit in memory, causing the read to fail.
  • 74. After
  • 75. Query range: [6, 16] Disk Memory sstable 14 Index I know where 14 is
  • 76. Query range: [6, 16] Disk Memory sstable 14 14 Read to memory
  • 77. Query range: [6, 16] Disk Memory sstable 14 14 16 15 16 15
  • 78. Query range: [6, 16] Disk Memory sstable 14 14 16 15 16 15
  • 79. Query range: [6, 16] Disk Memory sstable 14 14 16 15 16 15 Read to memory
  • 80. Query range: [6, 16] Disk Memory sstable 14 14 16 15 16 15 Where does row 13 start?
  • 81. Query range: [6, 16] Disk Memory sstable 14 14 16 15 16 15 Row metadata: previous row size 13 13
  • 82. Query range: [6, 16] Disk Memory sstable 14 14 16 15 16 15 13 13 12 12
  • 83. Query range: [6, 16] Disk Memory sstable 14 14 16 15 16 15 13 13 12 12 Reverse page 14 12 13 15 16 Return page
  • 84. New implementation: ■ Linear complexity ■ Memory consumption is O(page size)
  • 85. New implementation: ■ Linear complexity ■ Memory consumption is O(page size) Reversed reads from memtables were also improved.
  • 86. New implementation: ■ Linear complexity ■ Memory consumption is O(page size) Reversed reads from memtables were also improved. Caveat: reversed queries are allowed for single-partition queries only.
  • 87. ■ Querying different partitions of sizes 10MB, 15MB, …, 115MB, 110MB ■ Forward and reversed queries ■ Scylla 4.5 branch vs master branch (as of 30.12.2021) Comparison
  • 88. Schema: pk int, ck int, v text, primary key (pk, ck) Comparison
  • 89. Schema: pk int, ck int, v text, primary key (pk, ck) Query: Comparison SELECT * FROM ks.t WHERE pk = ? {ORDER BY ck DESC} BYPASS CACHE {LIMIT 1000}
  • 90.
  • 91.
  • 92. ■ Reversed queries in Scylla <= 4.5: ● time complexity quadratic w.r.t size of queried range ● memory consumption linear w.r.t size of queried range ■ mc sstable format allows a better implementation ■ Reversed queries in upcoming release: ● time complexity linear w.r.t size of queried range ● memory consumption is linear w.r.t page size Summary
  • 93. Thank you! Stay in touch Kamil Braun kbraun@scylladb.com

Editor's Notes

  1. Hi, My name is Eliran, a SW Team Leader at ScyllaDB, and I am here to share some of the results of an activity we had in the past year aiming at reducing overload impact on Scylla.
  2. Overload is something that may happen even on a correctly configured cluster and the main concern is not preventing it but keep serving requests despite the overload and producing as predictable performance results as possible. When thinking about it, once we eliminate all overload obviously undesired effects like crashes, stalls and disconnections, we still have to deal with a simple fact of life: the cluster, at the moment, can't fulfill every request according to the user's expectations.
  3. Of course there is a whole class of solutions that works right out of the box, some of them presented in the past, the most popular being “throw more money at it” and a derivative of that approach is keep the worst case sized DC for every workload. In the future we might have elasticity to deal with this at runtime but even when we do ,it is not guaranteed to solve all of our problems. We came to realize two things: In the real world, where we have limited amounts of money and processing power, overload is a fact of life. So it is better to optimize behaviour under overload than trying to aim for an overload free environment. Sometimes the point at which this overload hurts the user can be stretched, moreover, there are some workloads that are not really hurt by overload but only affected by it. Workloads can have different properties depending on the client SW or user profile that drives them, they can have different kinds of parallelism (small vs large, correlated vs uncorrelated etc..), latency distribution expectations, and some more subjective properties like priorities which are more about user perspective.
  4. We have already introduced a means to dealing with at least the last example, workload prioritization. At its core, this feature aims to let the user specify the importance of workloads relative to each other, this in turn reduces the impact workloads have on each other by distributing some of the system’s resources according to the user preferences.
  5. This alone has the potential to balance between some workloads and mitigate some cross workload impact (like shown in this graph taken from our blogpost), it shows how workload prioritization can be used to mitigate the impact of analytics on the latency of interactive workload.
  6. Unfortunately, in our latest efforts we discovered that it is not enough, this relative way of expressing workload priorities is not expressive enough on it’s own. For starters, different workloads have different and explicit expectations about latencies, which means that mitigation is not enough. There are some configurations that it is quite apparent that can’t hold globally in multi workload environments. Timeout being one of the easiest to reason about, but there are others and we will touch one more later. In addition, some of the very advanced techniques that we use today can be further tweaked to accommodate for this difference in workloads, our reader concurrency semaphore being one of them. (will not get into the details of this semaphore here but you are welcome to attend Bottond’s talk about it).
  7. With this understanding in mind we started to try and figure out what is missing, how can we improve our behaviour under overload. We very quickly came to realise that before improving the behaviour, we should make sure that this is actually an improvement, meaning, given that we are overloaded, the newly implemented behaviour is the one the user expects. After going over some real life examples from our bugs and issues we found that Scylla needs some hints about what is the expected behaviour for a specific workload and so, workload characterization was created.
  8. The following example is a classic use case where we are lacking the information to deal with overload. We want to support a simple webserver application, it has two workloads, the main workload consists of queries triggered by the user clicking or navigating to some areas of the website. The second workload is some analytics being ran periodically, to collect some statistics or to aggregate some information to be presented to all users.
  9. The users behind the main workload expect high responsiveness that translates to low latency and it means they will have a short timeout, another thing to notice is that you can’t prevent users from just clicking over and over again because of what appears to them as the page being stuck. Failing to set low enough timeout on the server side can also trigger a whole retry avalanche effect that would appear on the server side like very high or unbounded concurrency.
  10. On the other hand we have the secondary workload that makes a series of computations and can and probably is designed with a limited concurrency which means that it can be controlled with methods like throttling. This workload is a lot less sensitive to latency and it is more throughput oriented.
  11. As I mentioned for the main workload it is more suitable to have a very small timeout, and for the secondary workload we need a large timeout to accommodate for always full queues. Even with the main interactive workload’s timeout is configured, there is still the interactive user which can’t be configured, if he clicks over and over again there is little you can do about it except for having an accounting mechanism on another layer of the system, which means more development effort. However since there is only one server side timeout configuration and it should be less than the client one (or we can have retry avalanche on the extreme case and wasted resources in the less extreme case), we can’t optimize for both and whatever choice we make will be suboptimal for one of the workloads.
  12. We need to also decide on a proper response to overload, on interactive workload, it is probably beneficial to fail early by shedding load (if we see that our in flight requests are going to time out or starting to pile up) while on analytics we should delay some responses (or even wait for a timeout to naturally happen on the client side) since this serves as a backpressure mechanism.
  13. There are a lot of useful characteristics that can hint Scylla about the expected behavior, not all of which are implemented.
  14. The webserver example demonstrates that there is some information about workloads that can help Scylla to behave better, and to stretch the overload limit further. It can help us utilize our cluster better and help us to reduce the administrative effort while providing us valuable metrics (such as timeout per workload) and better isolation capabilities. It is also beneficial to characterise workloads in order to size the cluster correctly and in the future it can also help us to employ elasticity in a smarter way.
  15. A way to express those workload properties already existed in our enterprise version for a while and it is now extended and backported to our OSS version as well. This concept is called a service level, those service levels can be attached to roles. When a user logs into the system, all of the service levels that are attached to the user and to his granted roles are combined to form the workload characteristics. Then in turn, Scylla tweaks it’s behaviour for requests that are sent in this session (which is now tagged with specific workload characteristics).
  16. Utilizing this on our webserver example: For the main workload, we need low timeouts with load shedding as our overload response and we would like to have a lot of dedicated resources available whenever this workload needs them. For the secondary workload, we can have pretty large timeouts to accommodate for always full queues, we would like to throttle requests under load so the computation is stable and controllable and finally we would like this workload to have very little dedicated resources and will use mostly unused resources to achieve better cluster utilization.
  17. The aforementioned requirements can already be expressed in Scylla as shown here.
  18. This breakdown of the commands demonstrates how we would express our expectations for each workload, it is already fully implemented.
  19. Workload characterization is still work in progress, the service level mechanism gives us a way to easily add more advanced configuration options in the future. There are still a lot of future improvements that can be implemented on a per workload basis, but according to what we have learnt, per workload characterization is one of the cornerstones in utilizing the cluster in full on one hand and doing the right thing in the presence of overload on the other.
  20. Hi everyone. In this presentation I'm going to go over our recent improvements to the out-of-memory resilience of reads on the replica.
  21. My name is Botond and I'm a software engineer working at ScyllaDB since 2017, as a member of the storage team.
  22. We want to control the concurrency of reads on the replica, with the goal of keeping concurrency within a useful limit and to avoid resource exhaustion. This happens via a semaphore, which is dual-limited with count and memory resources. Each read consumes 1 count and fixed amount of memory on admission. As the read progresses its memory usage is tracked and is consumed from the semaphore's memory units. We have a separate semaphore for each scheduling group on each shard and semaphores are created with a fixed amount of count resources and an amount of memory that is some percentage of the shard's total amount of memory. 2% of the shard’s memory is a typical value. As for count, the user read semaphore has 100 counts and internal semaphores have 10 counts.
  23. We depend on the accuracy of the tracking of the memory consumption of reads for being able to determine whether a new read can be admitted or not. It is crucial that we track the memory consumption of as many aspects of reads as practical. This is an area that we've improved a lot recently: * The I/O buffer tracking had a bug where buffers were tracked only after the I/O completed, not when they were allocated. * We now track all buffers used in I/O and parsing, from the moment they are allocated. * We also track the internal buffers of readers. These improvements vastly improved the effectiveness of our memory based concurrency control. There is still more aspects of reads that are untracked but extending the coverage has diminishing returns. For example fixing the tracking of I/O buffers was a 3 line change and brough huge improvements. Tracking the buffers of readers was a lot of work and in some worklods the effect is not even noticeable.
  24. We have also addressed the most common causes of out-of-memory condition directly. These are unpaged and reverse reads or their combination for extra effect, both consuming an unbounded memory internally. We have recently fixed this aspect of the reverse reads -- but we have a separate presentation on that. We introduced a soft and hard limit pair for these kinds of reads: the replica will print a warning when the read's memory consumption reaches the soft limit and abort the read when it reaches the hard limit.
  25. Historically we only applied concurrency control on reads at the moment they had to go to disk. On these pictures you can see a reader-tree for a typical read. On the top level we have a memtable reader and a cache reader, their output being merged by a combined reader. The cache represents the content of the disk so on a cache miss, the cache creates a disk reader and the read from disk will happen through the cache, populating the cache with the read data in the process. We might read from more than one sstables when reading from the disk, in which case we again use a combined reader to merge their contents into a single stream. As you can see the concurrency control, represented here by the "restricted reader" is injected between the cache and the disk and therefore is only activated when the disk readers are created. The dotted purple rectangle represents the readers covered by concurrency control. I suppose the assumption here was that in-memory reads completely very quickly and therefore don't need concurrency control. This assumption was proven false when we started seeing out-of-memory conditions caused by cache reads so we had to "move" the semaphore to the very front. Reads now have to pass admission before the reader objects are even created.
  26. Despite all the improvements just discussed problems can still happen. Previously the only way to debug out-of-memory and concurrency semaphore related bugs was only possible with coredumps. Coredumps are a pain to work with. First of all a coredump has to be available, which is not always the case when memory runs out, often times the only simptom is std::bad_alloc error messages being spammed in the logs. To help with investigating out-of-memory and concurrency semaphore related bugs we have improved the diagnostics around these. When memory runs out, scylla will dump a report about the state of its memory allocator. By default this only happens when critical allocations fail -- those that will eventually cause a crash. But this can be configured to be dumped on any bad alloc. We added a similar report dump to the concurrency semaphore: this is dumped when the semaphore times out or its queue is overflown. These reports are dumped to the logs, which are easily obtainable and can be used to kick-start the investigation. In some cases the report itself might be enough to pinpoint the bug on its own.
  27. With all the discussed improvements we are at a quite good place now. Out-of-memory crashes are actually quite rare now. We are also currently working on improving other weak spots that we know about. There might also still be corner cases hiding so we are not letting our guard down.
  28. Let’s do a quick recap on sstable indexing in Scylla Index is a data structure used during query to narrow down data file location according to query restrictions. The sstable index has a complex representation on disk, but can be conceptualized as a search tree, and that’s how I will be depicting it for simplicity.
  29. The top of the tree corresponds to the partition key index The bottom corresponds to the clustering key index Each partition has its own clustering key index which can vary in depth between partitions. It’s shown as equal depth here for simplicity
  30. The partition key index is actually divided into two level, with the top level called the summary
  31. The summary is stored in a separate file on disk and is always present in RAM. We inherited this from Cassandra, and the reason for the split as far as I know was to keep part of the partition index in RAM to speed up reads. The size of the summary is limited so that it fits in RAM
  32. The summary is stored in a separate file on disk and is always present in RAM. We inherited this from Cassandra, and the reason for the split as far as I know was to keep part of the partition index in RAM to speed up reads. The size of the summary is limited so that it fits in RAM
  33. Almost all, except: Keys known to be outside of the data file based on summary information Partition index pages are shared among concurrent readers
  34. The index has hierarchy
  35. The index has hierarchy
  36. Hi. Recently Scylla gained a new implementation of reversed queries, which I’m going to talk about in this presentation.
  37. My name is Kamil, I’m a software engineer at Scylla.
  38. A reversed query is a query in which the specified clustering key ordering is different from the ordering specified in the schema. There are two possible orderings: ascending and descending. If your schema was created with ascending order, a query with descending order is a reversed query, and vice-versa.
  39. For example, suppose your schema specifies ascending order. In your query, if you don’t specify the order or explicitly specify it to be ascending, you get a regular “forward” query and your data will be sorted according to the clustering key in ascending order.
  40. If you specify the order to be descending, this is a reversed query and your data will be sorted according to the clustering key in descending order.
  41. The old implementation of reversed queries had significant problems. I’ll illustrate it using sstable reads.
  42. Suppose we want to query a range of clustering keys: from 6 to 16.
  43. The old implementation would start by performing a regular forward query on this range, fetching it entirely into memory.
  44. It would then iterate over rows in the range in reverse and construct a page, then return it.
  45. All those rows fetched from the queried range which don’t belong to the first page are wasted work. We throw them away.
  46. When a second page is requested, we do the same thing, but with a smaller query range.
  47. This solution has a quadratic complexity problem. Suppose there are 10 pages of data. We would: Read pages 1 to 10 into memory, return page 10 Read pages 1 to 9 into memory, return page 9 … Read page 1 into memory, return page 1
  48. For N pages, we read N + (N-1) + (N-2) + … + 1 = O(N^2) pages! This could be improved by caching the result of the initial query after the first page, and getting back to it when the second page is requested. Unfortunately, we don’t know how long we’re going to wait for the second page, and in the meantime, the cache may be cleared or invalidated. And we may not even want to cache the result in the first place due to the second problem…
  49. Which is that the entire range may consume huge amounts of memory. Just to return that single page we need to fetch the entire queried range. It may not even fit in memory, causing the read to fail. With sufficiently large partitions and sufficiently large queries, this old implementation would simply not work.
  50. The new implementation solves these two problems. Both sstable and memtable reversed reads were improved. For illustration I’ll focus on sstables since that’s where most of the data resides and, in my opinion, where the most interesting changes happened.
  51. Back to the example. The queried range of clustering keys is 6 to 16. To return the first page, we first need to find the last row in the queried range, in this example with key 16. We consult the index to find the nearest row before 16 that it knows of. Let’s say that the index knows where 14 is.
  52. We fetch a chunk of data into memory, starting at the position given by the index.
  53. At this point it looks just like a forward query. We parse row 14 which gives us the position of row 15 - which we didn’t know until now because the sizes of rows are not constant in general. Now we can parse row 15. Then row 16.
  54. If there is any remaining data in the fetched buffer we can discard it, as we only care about the queried range.
  55. Now suppose we need more rows for our page. We fetch a chunk of data into memory, but this time before row 14.
  56. And now we face a problem: where does row 13 start in this buffer?
  57. Thankfully, the sstable mc format comes to the rescue. In the mc format (and newer ones), every row stores the size of the previous row. Which we use to learn the position of row 13 so we can parse it.
  58. We can continue like this, fetching more buffers if necessary, until we fill the page.
  59. Finally we can drop any unnecessary data and return the page reversed. Note: this is a simplified picture, but it gives the rough idea of how things work today when reading in reverse from mc sstables and newer formats. An important part of the implementation is the previous row size metadata which was not available in ka/la sstable formats. If your data is stored in these older formats, we keep using the old method.
  60. The new implementation features linear complexity, And much better memory consumption, proportional to the page size; we no longer need to fetch the entire queried range into memory at once.
  61. Reversed reads from memtables previously worked similarly as for sstables: we would perform a forward read of the entire queried range, then return the actual requested page. Now we perform a direct reverse traversal of the memtable structure.
  62. Note that reversed queries are not allowed for range scans, only for single-partition queries.
  63. Now let’s look at some numbers. For the purposes of this presentation I did a very simple benchmark to see how this new implementation performs compared to the old one. I used a simple single-node setup on my laptop. Didn’t set up a larger cluster since most of the changes happened on the storage layer - in the code where we read sstables and memtables. I created partitions of different sizes: 10MB, 15MB, 20MB, and so on up to 110MB, and queried them forward and backward. I did this using the OS 4.5 branch and master branch.
  64. The schema had 2 integer columns for partition and clustering key, and one text column so I could insert larger rows; the only reason for this was to reduce the time to insert the data.
  65. For each partition I’m running these queries: with and without the order by clause, bypass cache so we exercise our sstables, and two versions: with and without a row limit. The row limit version makes the result fit in a single page (the specific number of rows is not that important). The no limit version gives us the entire partition. I performed each query 10 times, took the mean and standard deviation, and plotted an errorbar. By the way, I’m using the Python driver which is not the fastest of drivers, so it may cause a bit of overhead.
  66. On the X axis we see the partition size in MB, on the Y axis we see the query duration in milliseconds. This is a graph for the row limit select, so we fetch only a single page. As we can see, forward query duration does not depend on the partition size, it’s roughly constant, here about 3ms. The results are pretty much the same on master and 4.5. But for reversed queries there’s a significant difference. On 4.5 the duration of a reversed query increases linearly with the partition size, even though we’re always fetching the same number of rows. And at 100MBs reversed queries start to fail because there is a memory consumption limit which we exceed. On master however, reversed queries act similarly to forward queries. They are a bit worse than forward queries, here the duration is about 3ms, but it does not depend on the partition size.
  67. If we drop the row limit, this is what we get. The duration of forward queries increases linearly with the partition size, which is expected since we’re querying the whole partition. On 4.5, the duration of a reversed query is a quadratic function of the partition size, but on master, it’s again linear, as with forward queries.
  68. Summarizing: in Scylla 4.5 and older, reversed queries had quadratic time complexity w.r.t the size of the queried range, memory consumption was linear, for sufficiently large ranges the query would have to fail. Mc and newer sstable formats allow a better implementation. In upcoming release the time complexity of reversed queries is linear w.r.t the size of the queried range, and memory consumption is linear w.r.t page size, so even if your range is large, you can still perform reversed queries on it.