Successfully reported this slideshow.
Your SlideShare is downloading. ×

Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 32 Ad

More Related Content

Similar to Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely (20)

More from ScyllaDB (20)

Advertisement

Recently uploaded (20)

Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely

  1. 1. Making the Most Out of ScyllaDB’s Awesome Concurrency at Optimizely Brian Taylor, Principal Engineer
  2. 2. Brian Taylor ■ I am married with 3 young children ■ I have created 2 programming languages and 2 databases for legitimate business reasons ■ I love discovering some property in the solution space that, when maintained, makes everything simpler
  3. 3. ■ ScyllaDB Loves Concurrency ■ To a point ■ How to keep the wheels on the bus ■ Mommy, where does concurrency come from? ■ Easy way ■ Good way Presentation Agenda
  4. 4. ScyllaDB Loves Concurrency
  5. 5. Conceptual Model Network Execution S R Q Q Network Q Q Client Client
  6. 6. Why Does Concurrency Matter? S R Rtotal Reff = Rtotal /4 = (R + 3S) / 4 It lets us hide round trip latency!
  7. 7. Throughput - X - measured in things per second (what you probably care about) Concurrency - N - number of independent requests happening at the same time (the tool of this talk) Service Time - S - how long ScyllaDB says a thing took Request Time - R - how long something takes from the client’s perspective R = S + round trip time X = 1 / R (pure sequential) N / R (in the linear region) X maxX (in the saturation region) 💩 (in the retrograde region) Definitions
  8. 8. User User Classic Closed Loop Load Testing SUT User Request Response ■ Users can have up to one request in flight at any time ■ Users must receive the response to their request before issuing the next request ■ Not directly useful for modern capacity planning
  9. 9. Universal Scaling Law Generalization of Amdahl’s Law discovered by Dr. Neil Gunther. As number of users (N) increases, the system throughput (X) will: ■ Enjoy a period of near linear scaling ■ Eventually saturate some resource such that increasing N doesn’t increase X. This defines maxX ■ Possibly encounter a coordination cost that drives down X with further increasing N Saturation Region Linear Region ScyllaDB-bench, 3x i3.large, average throughput vs concurrency Retrograde Region maxX
  10. 10. Modern Open Loop Load Testing Constant Throughput Source SUT Does not model users or think times. Instead models load as a constant throughput load source. Good match for capacity planning internet connected systems where we typically know requests per second but don’t really care how many users are behind that load. ■ The start time of every request in the test is pre-ordained and does not depend on how the SUT is handling previous requests ■ Concurrency is theoretically unbounded https://github.com/optimizely/scylla-bench-crate
  11. 11. Closed-loop testing: choose concurrency (N) Open-loop testing: choose throughput (X) ■ X in the linear region will imply a bounded N (and R is very stable) ■ X in the saturation region can have unbounded N and R is very chaotic ■ When X exceeds saturation, N and R are undefined because the system will not keep up ■ The USL is not a single valued function of X: This has interesting implications as X approaches saturation Relating the Testing Models
  12. 12. Network Execution S R Q Q Network Q Q 4 kops/s R99 = 1.487 ms 2 <= N <= 22 40 kops/s R99 = 2.527 ms 23 <= N <= 255 S99 = 0.634 ms Independent of kops/s Linear Region
  13. 13. Linear Region Throughput is directly proportional to concurrency ■ The size of the cluster (in shards) and its aggregate SSD throughput will determine how large the linear region is ■ You should engineering your system to take full advantage of the linear region SSD Throughput Allocation 4 kops/s 40 kops/s
  14. 14. Saturation Region Network Execution S R Q Q Network Q Q 100 kops/s R99 = 17.535 ms 295 <= N <= 2540 S99 is increasing with runtime
  15. 15. Saturation Region Throughput is approximately constant, regardless of concurrency ■ At this point, assuming a well tuned cluster and workload, we are writing to disk as fast as the disks can take it ■ Concurrency is no longer a mechanism to increase throughput and is now becoming a risk to stability ■ You should engineer your system to stay out of the saturation region Saturation Region Linear Region Retrograde Region 100 kops/s
  16. 16. Network Execution Q Q Network Q Q Foreground Background Execution Q Compaction Debt SSD SSTable Merges SSTable Flushes, Commit Log Saturation Region SSD Throughput Allocation
  17. 17. Retrograde Region Network Execution S R Q Q Network Q Q 100 kops/s R99 = 55.8 s 75 <= N <= 4096 Once something bounces us into the retrograde region, S99 becomes 1000x worse than linear
  18. 18. Retrograde Region Increasing concurrency now decreases throughput. A system that enters this region is likely to get stuck here until demand declines ■ The harder we push, the less throughput we get and the more demand builds which makes us want to push harder ■ “Pushing harder” consumes more client resources (threads, futures, ScyllaDB driver state). The road to hell will terminate with an OOM unless there’s some other limiting factor Saturation Region Linear Region Retrograde Region 100 kops/s Road to hell
  19. 19. Stay in the linear region and you’ll enjoy consistent latencies and bounded concurrency. Stray into saturation and you’re likely to get stuck in the retrograde region until load subsides. ■ Scale ScyllaDB such that you’re “always” going to be operating in the linear region for your expected load ■ Design concurrency limiting mechanisms that keep you out of the retrograde region during unexpected spikes in load ■ If you have work to do and can do it in the linear region: DO IT What Have We Learned?
  20. 20. Where Does Concurrency Come From?
  21. 21. These are the boring code wonk answers. ■ Threads ■ Cheap end: 8kb per go-routine. Low thousands is reasonable ■ Expensive end: 1MB per java thread. Low hundreds is reasonable ■ Reactors ■ Rust tokio, java futures, seastar, nodejs callbacks: typically <1kb / instance. Tens of thousands is reasonable ■ Nodes ■ $$ limited Mechanisms of Concurrency
  22. 22. Data dependency is the mother of sound concurrency ■ Easy: No dependency: logging facts at independent keys. Write only ■ Medium: Partitionable dependency. As long as we process each independent streams sequentially, everything will be fine: maintain latest state at a single key ■ Hard: Arbitrary “happens-before” relationships: add a relationship between two nodes in a graph Mother of Concurrency
  23. 23. ■ Command: Represents an atomic unit of work. Contains IOPs. Always concludes with a write, may contain reads. ■ IOP: IO Operation. A unit of work for the database. ■ Batch: A group of IOPs that may be executed concurrently. Write IOPs within a batch may literally be combined into a batch operation to ScyllaDB. Batches execute sequentially with other batches ■ Slot: A cubby for data. Has a name. Can be read or written (partition + clustering key in ScyllaDB) ■ Concurrency strategy: How we group IOPs into batches such that the final slot state is consistent with commands having all been executed sequentially Definitions
  24. 24. No Dependency Write A Command 1 Command 2 Command 4 Command 5 Batch 1 Batch 2 Command 3 Write B Write C Write D Write E When commands contain no reads and always write to different slots there can be no data dependency. The decision about when to switch from batch 1 to 2 can be arbitrary, or driven by a desire to minimize latency, or to work within the ScyllaDB batch size constraint
  25. 25. For a read/modify/write (RMW) operation to yield correct results, reads must be able to observe the writes of prior RMWs. Most streaming platforms (storm, flink, kafka-streams, spark) trivially solve this by partitioning commands into guaranteed independent streams. This means that: ■ Every command has a happens-before relationship with every following command for the partition key ■ Cross command concurrency is impossible within a partition ■ Concurrency is limited by the cardinality of the partition key Read/Modify/Write and “Happens Before”
  26. 26. Command 1 / Tenant B Partitioned Concurrency Read A Batch 1 Batch 2 Command 2 / Tenant A Write B Read C The work for each tenant is executed strictly in the order it was received. This guarantees that reads will always see prior writes but misses opportunities for greater concurrency by being ignorant of non-interacting slot usage Batch 3 Batch 4 Write B Read D Write D Command 1 / Tenant A Happens before Happens before
  27. 27. ■ Reads of a slot must be able to observe any writes to that slot that came before them. ■ Writes create a happens-before relationship with any reads that follow them ■ The final value in a slot must reflect last write in the sequence ■ Writes create a happens-before relationship with any writes that follow them Golden Rules of Data Dependency
  28. 28. Command 1 / Tenant B Golden Rule Happens Before Read A Batch 1 Batch 2 Command 2 / Tenant A Write B Read C By examining slot usage and applying the golden rules we can eliminate a batch and get concurrency within a partition Batch 3 Write B Read D Write D Command 1 / Tenant A Happens before
  29. 29. Each of the golden rules implies a simplification rule that we can use to further compress command execution ■ Reads of a slot must be able to observe any writes to that slot that came before them. ■ Reads do not have to observe prior writes by literally reading the database ■ The final value in a slot must reflect last write in the sequence ■ If a prior write is not read from the database, it can be omitted as long as the final write happens Simplification Rules of Data Dependency
  30. 30. Command 1 / Tenant B Data Dependency Simplification Read A Batch 1 Batch 2 Command 2 / Tenant A Write B Read C Reads can directly read prior writes. Overwritten writes can be skipped. Write B Read D Write D Command 1 / Tenant A Happens before
  31. 31. Data dependency is the mother of sound concurrency. If you can find enough sound concurrency in your problem then you can exploit the full linear region of ScyllaDB’s awesome concurrency. ■ Case 1: 64 cores, thread and partition concurrency. Maximum throughput about 8 kcommands/s. Very sensitive to “data-shape” aka partition cardinality. ■ Case 2: 15 cores, reactor and happens-before concurrency. Maximum throughput about 30 kcommands/s. Insensitive to most practical “data-shape” issues. What’s the Point?
  32. 32. Thank You Stay in Touch Brian Taylor brian.taylor@optimizely.com @netguy204 netguy204 www.linkedin.com/in/brian-ttaylor

×