Keeping Latency Low and Throughput High with Application-level Priority Management

Brought to you by
Keeping Latency Low and Throughput
High with Application-level
Priority Management
Avi Kivity
CTO at

Avi Kivity
CTO at ScyllaDB
Creator and ex-maintainer of Kernel-based Virtual Machine (KVM)
Creator of the Seastar I/O framework
Co-founder, CTO @ ScyllaDB

Comparing throughput and latency
Throughput computing (~ OLAP)
■ Want to maximize utilization
■ Extensive buffering to hide
device/network latency
■ Total time is important
■ Fewer operations, serialization is
permissible
Latency computing (~ OLTP)
■ Leave free cycles to absorb
bursts
■ Cannot predict data to read
Often must synchronously write
■ Individual operation time is
important
■ Many operations execute
concurrently

Why mix throughput and latency computing?
■ Run different workloads on the same data - HTAP
● Fewer resources than dedicated clusters
■ Maintenance operations on an OLTP workload
● Garbage collection
● Grooming a Log-Structured Merge Tree (LSM Tree)
● Cluster maintenance - add/remove/rebuild/backup/scrub nodes

General plan
1. Isolate/identify different tasks
2. Schedule tasks

Isolating tasks in threads
■ Each operation becomes a thread
● Perhaps temporarily borrowed from a thread pool
■ Let the kernel schedule these threads
■ Inﬂuence kernel choices with priority

Isolating tasks in threads
Advantages
■ Well understood
■ Large ecosystem
Disadvantages
■ Context switches are expensive
■ Communicating priority to the OS is
hard
● Priority levels not meaningful
■ Locking becomes complex and
expensive
■ Priority inversion is possible
■ Kernel scheduling granularity may be
too high

Application-level task isolation
■ Every operation is a normal object
■ Operations are multiplexed on a small number of threads
● Ideally one thread per logical core
● Both throughput and latency tasks on the same thread!
■ Concurrency framework assigns tasks to threads
■ Concurrency framework controls order

Application-level task isolation
Advantages
■ Full control
■ Low overhead with cooperative scheduling
■ Many locks become unnecessary
■ Good CPU aﬃnity
■ Fewer surprises from the kernel
Disadvantages
■ Full control
■ Less mature ecosystem

Application-managed tasks
Scheduler
tq1 tq2 tq3 tqn

Execution timeline
time
tq1 tq2 tq3 tq1 tq2 tq3

Switching queues
■ When queue is exhausted
● Common for latency sensitive queues
■ When time slice is exhausted
● Throughput oriented queues
● Queue may have more tasks
● Tasks can be preempted
■ Poll for I/O
● io_uring_enter or equivalent
■ Make scheduling decision
● Pick next queue
● Scheduling goal is to keep q_runtime / q_shares equal across queues
● Selection of queue is not round-robin

Preemption techniques
■ Read clock and compare to timeslice end deadline
● Prohibitively expensive
■ Use timer+signal
● Works, icky locking
■ Use kernel timer to write to user memory location
● linux-aio or io_uring
● Tricky but very eﬃcient

Stall detector
■ Signal-based mechanism to detect where you “forgot” to add a
preemption check
■ cf. Accidentally Quadratic

About ScyllaDB
■ Distributed OLTP NoSQL Database
■ Compatibility
● Apache Cassandra (CQL, Thrift)
● AWS DynamoDB (JSON/HTTP)
● Redis (RESP)
■ ~10X performance on same hardware
■ Low latency, esp. higher percentiles
■ C++20, Open Source
■ Fully asynchronous; Seastar!

Dynamic Shares Adjustment
• Internal feedback loops to balance competing loads
Memtable
Seastar
Scheduler
Compaction
Query
Repair
Commitlog
SSD
Compaction
Backlog
Monitor
Memory
Monitor
Adjust priority
Adjust priority
WAN
CPU

Resource partitioning (QoS)
• Provide different quality of service to different users
Memtable
Seastar
Scheduler
Compaction
Query 1
Repair
Commitlog
SSD
Compaction
Backlog
Monitor
Memory
Monitor
Adjust priority
Adjust priority
WAN
CPU
Query 2

I/O scheduling
■ Logically, same
■ But scheduling an entity much more complicated than a CPU core
■ More diﬃcult cross-core coordination
■ More in Pavel’s talk
● “What We Need to Unlearn about Persistent Storage”

Brought to you by
Avi Kivity
@AviKivity

Keeping Latency Low and Throughput High with Application-level Priority Management

More Related Content

What's hot

Similar to Keeping Latency Low and Throughput High with Application-level Priority Management

More from ScyllaDB

Recently uploaded

Keeping Latency Low and Throughput High with Application-level Priority Management