Successfully reported this slideshow.
Your SlideShare is downloading. ×

Keeping Latency Low and Throughput High with Application-level Priority Management

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 20 Ad

Keeping Latency Low and Throughput High with Application-level Priority Management

Download to read offline

Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity will show how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.

Throughput and latency are at a constant tension. ScyllaDB CTO and co-founder Avi Kivity will show how high throughput and low latency can both be achieved in a single application by using application-level priority scheduling.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Keeping Latency Low and Throughput High with Application-level Priority Management (20)

Advertisement

More from ScyllaDB (20)

Recently uploaded (20)

Advertisement

Keeping Latency Low and Throughput High with Application-level Priority Management

  1. 1. Brought to you by Keeping Latency Low and Throughput High with Application-level Priority Management Avi Kivity CTO at
  2. 2. Avi Kivity CTO at ScyllaDB Creator and ex-maintainer of Kernel-based Virtual Machine (KVM) Creator of the Seastar I/O framework Co-founder, CTO @ ScyllaDB
  3. 3. Comparing throughput and latency Throughput computing (~ OLAP) ■ Want to maximize utilization ■ Extensive buffering to hide device/network latency ■ Total time is important ■ Fewer operations, serialization is permissible Latency computing (~ OLTP) ■ Leave free cycles to absorb bursts ■ Cannot predict data to read Often must synchronously write ■ Individual operation time is important ■ Many operations execute concurrently
  4. 4. Why mix throughput and latency computing? ■ Run different workloads on the same data - HTAP ● Fewer resources than dedicated clusters ■ Maintenance operations on an OLTP workload ● Garbage collection ● Grooming a Log-Structured Merge Tree (LSM Tree) ● Cluster maintenance - add/remove/rebuild/backup/scrub nodes
  5. 5. General plan 1. Isolate/identify different tasks 2. Schedule tasks
  6. 6. Isolating tasks in threads ■ Each operation becomes a thread ● Perhaps temporarily borrowed from a thread pool ■ Let the kernel schedule these threads ■ Influence kernel choices with priority
  7. 7. Isolating tasks in threads Advantages ■ Well understood ■ Large ecosystem Disadvantages ■ Context switches are expensive ■ Communicating priority to the OS is hard ● Priority levels not meaningful ■ Locking becomes complex and expensive ■ Priority inversion is possible ■ Kernel scheduling granularity may be too high
  8. 8. Application-level task isolation ■ Every operation is a normal object ■ Operations are multiplexed on a small number of threads ● Ideally one thread per logical core ● Both throughput and latency tasks on the same thread! ■ Concurrency framework assigns tasks to threads ■ Concurrency framework controls order
  9. 9. Application-level task isolation Advantages ■ Full control ■ Low overhead with cooperative scheduling ■ Many locks become unnecessary ■ Good CPU affinity ■ Fewer surprises from the kernel Disadvantages ■ Full control ■ Less mature ecosystem
  10. 10. Application-managed tasks Scheduler tq1 tq2 tq3 tqn
  11. 11. Execution timeline time tq1 tq2 tq3 tq1 tq2 tq3
  12. 12. Switching queues ■ When queue is exhausted ● Common for latency sensitive queues ■ When time slice is exhausted ● Throughput oriented queues ● Queue may have more tasks ● Tasks can be preempted ■ Poll for I/O ● io_uring_enter or equivalent ■ Make scheduling decision ● Pick next queue ● Scheduling goal is to keep q_runtime / q_shares equal across queues ● Selection of queue is not round-robin
  13. 13. Preemption techniques ■ Read clock and compare to timeslice end deadline ● Prohibitively expensive ■ Use timer+signal ● Works, icky locking ■ Use kernel timer to write to user memory location ● linux-aio or io_uring ● Tricky but very efficient
  14. 14. Stall detector ■ Signal-based mechanism to detect where you “forgot” to add a preemption check ■ cf. Accidentally Quadratic
  15. 15. Implementation in ScyllaDB
  16. 16. About ScyllaDB ■ Distributed OLTP NoSQL Database ■ Compatibility ● Apache Cassandra (CQL, Thrift) ● AWS DynamoDB (JSON/HTTP) ● Redis (RESP) ■ ~10X performance on same hardware ■ Low latency, esp. higher percentiles ■ C++20, Open Source ■ Fully asynchronous; Seastar!
  17. 17. Dynamic Shares Adjustment • Internal feedback loops to balance competing loads Memtable Seastar Scheduler Compaction Query Repair Commitlog SSD Compaction Backlog Monitor Memory Monitor Adjust priority Adjust priority WAN CPU
  18. 18. Resource partitioning (QoS) • Provide different quality of service to different users Memtable Seastar Scheduler Compaction Query 1 Repair Commitlog SSD Compaction Backlog Monitor Memory Monitor Adjust priority Adjust priority WAN CPU Query 2
  19. 19. I/O scheduling ■ Logically, same ■ But scheduling an entity much more complicated than a CPU core ■ More difficult cross-core coordination ■ More in Pavel’s talk ● “What We Need to Unlearn about Persistent Storage”
  20. 20. Brought to you by Avi Kivity @AviKivity

×