Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla


Published on

When ingesting large amounts of data into a Scylla cluster, we would like the ingestion to proceed as quickly as possible, but not quicker. We explain how over-eager ingestion could result in a buildup of queues of background writes, possibly to the point of depleting available memory. We then explain how Scylla avoids this risk by automatically slowing down well-behaving applications to the best possible ingestion rate (“flow control”). For applications which cannot be slowed down, Scylla still achieves the highest possible throughput by quicky rejecting excess requests (“admission control”). In this talk we investigate the different causes of queue buildup during writes, including consistency-level lower than “ALL” and materialized views, and review the mechanisms which Scylla uses to automatically solve this problem.

Published in: Software
  • Be the first to comment

Scylla Summit 2018: Worry-free ingestion - flow-control of writes in Scylla

  2. 2. Presenter bio Nadav Har'El has had a diverse 20-year career in computer programming and computer science. Among other things, he worked on high-performance scientific computing, networking software, information retrieval and data mining, virtualization and operating systems. Today he works on Scylla.
  4. 4. Ingestion ▪ We look into ingestion of data into a Scylla cluster, i.e., a large volume of update requests (writes). ▪ We would like the ingestion to proceed as quickly as possible, but not quicker.
  5. 5. “as quickly as possible, but not quicker”? ▪ An over-eager client may send writes faster than the cluster can complete earlier requests. ▪ We can absorb a short burst of requests in queues. Allow the client to continue writing at excessive rate Backlog of uncompleted writes grows until memory runs out!
  6. 6. Goals of this talk Understand the ▪ different causes of queue buildup during writes. ▪ flow control mechanisms Scylla uses to automatically cause ingestion to proceed at the optimal pace. Simulator for experimenting with different workloads and flow control mechanisms.
  7. 7. Dealing with an over-eager writer ▪ As the backlog grows, server needs a way to tell the client to slow down its request rate. ▪ The CQL protocol does not offer explicit flow-control mechanisms for the server to slow down a client. ▪ Two options remain: delaying replies to the client’s requests, and failing them. ▪ Which we can use depends on what drives the workload:
  8. 8. Workload model 1: Fixed concurrency ▪ Batch workload: Application wishes to write a large batch of data, as fast as possible (driving the server at 100% utilization). ▪ A fixed number of client threads, each running a request loop: prepare some data, make a write request, waiting for response. ▪ The server can control the request rate by throttling (delaying) its replies: If the server only sends N replies per second, the client will only send N new requests per second!
  9. 9. Workload model 2: Unbounded concurrency ▪ Interactive workload: The client sends requests driven by some external events (e.g., activity of real users). • Request rate unrelated to completion of previous requests. • If request rate is above cluster capacity, the server can’t slow down these requests and the backlog grows and grows. • To avoid that, we must fail some requests. ▪ To reach optimum throughput, we should fail fresh client requests. Admission control.
  11. 11. Regular writes (no MV) A coordinator node receives a write and: ▪ Sends it to to RF (e.g., 3) replicas. ▪ Waits for CL (e.g., 2) of those writes to complete, ▪ Then reply to the client: “desired consistency-level achieved”. ▪ The remaining (e.g., 1) writes to replicas will continue “in the background” without the client waiting.
  12. 12. Why background writes cause problems ▪ A batch workload, upon receiving the server’s reply, will send a new request before these background writes finish. ▪ If new writes come in faster than we complete background writes, the number of these background writes can grow without bound.
  13. 13. Typical cause: a slower node ▪ 3 nodes. One slightly slower: ▪ RF=3, CL=2. ▪ Each second: • 10,000 writes are replied when the two faster nodes respond. • Backlog of background writes to slowest node increases by 100. 9,900 wps (1% slower) 10,000 wps 10,000 wps
  14. 14. Regular writes - Scylla’s solution ▪ A simple, but effective, throttling mechanism: • When total memory used by background writes exceeds some limit (10% of shard’s memory), the coordinator will only reply after all RF replica writes have completed. (not CL). • The backlog of background writes does not continue to grow • Replies are only sent at the rate we can complete all the work, so a batch workload will slow down its requests to the same rate.
  15. 15. Solution in slower node example max backlog = 300
  17. 17. Write to table with materialized views ▪ As before: coordinator sends writes to RF (e.g., 3) replicas, waits for only first CL (e.g., 2) of those writes to complete. ▪ Each replica later sends updates to view table(s) - The client does not wait for these view updates. • A deliberate, though often-debated, design choice. • Needed for high availability - base-replica update shouldn’t need many node to be available before finishing.
  18. 18. Why view writes cause problems ▪ Again the problem is background work the client doesn’t await. ▪ There may be a lot such work - with V views, we typically have 2*V of these background view writes . ▪ If new writes come in faster than we can finish view updates, the number of these queued view updates can grow without bound.
  19. 19. Simulation
  20. 20. View updates - solution ▪ How to prevent view backlog from growing endlessly? ▪ Delay each client write by an extra delay. • Higher delay: slows client, view-update backlog declines. • Lower delay: speeds-up client, view-update backlog increases. ▪ Goal: a controller, changing delay as backlog changes, to keep backlog in desired range.
  21. 21. View updates - solution ▪ A simple linear controller: • delay = α * backlog ▪ When client concurrency is fixed, will converge on delay that keeps backlog constant: • If delay is higher, client slows and backlog declines, causing delay to go down.
  22. 22. Simulation
  23. 23. View updates - solution variants ▪ The controller can use any function, not just linear: • delay = α * backlog0 * f(backlog / backlog0) ▪ Controller can also have an integral term - delay is not just a function of backlog but also of previous history: • For example, change α to move the backlog from arbitrary length it settled on to a chosen length:
  24. 24. Simulation
  25. 25. Conclusions ▪ Scylla flow-controls a batch workload, • data is ingested as fast as possible - but not faster than that. ▪ When client cannot be slowed down to the optimal rate, • Scylla starts dropping new requests, to achieve the highest possible throughput of successful writes. ▪ Automatic, no need for user intervention or configuration.
  26. 26. Thank You Any Questions ? Please stay in touch @nyharel