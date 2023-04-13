Successfully reported this slideshow.
Your SlideShare is downloading. ×

How Optimizely (Safely) Maximizes Database Concurrency.pdf

Apr. 13, 2023
0 likes 0 views
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
Loading in …3
×

Check these out next

Aggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDB
ScyllaDB
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
ScyllaDB
ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime
ScyllaDB
Automating the Hunt for Non-Obvious Sources of Latency Spreads
ScyllaDB
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
Integrating ScyllaDB with Quarkus
ScyllaDB
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
ScyllaDB
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...
ScyllaDB
1 of 37 Ad

How Optimizely (Safely) Maximizes Database Concurrency.pdf

Apr. 13, 2023
0 likes 0 views

Download to read offline

Technology

Having a database that’s capable of high concurrency is one thing, but actually tapping all that potential concurrency is another. Fortunately, Optimizely Engineering has developed practical strategies that can help other teams.

Learn how Optimizely Engineering takes full advantage of the high concurrency that’s possible with their NoSQL database, ScyllaDB – while also guaranteeing correctness and protecting the quality of service. Brian Taylor, Principal Software Engineer, will offer a technical deep dive on:

- Understanding concurrency and its impact on throughput and latency
- Closed loop load testing, open loop load testing & the Universal Scaling Law
- The type of load testing you should be performing for capacity planning
- How to identify the region where your database can make the best use of concurrency
- Strategies for optimizing sound concurrency based on your data dependencies

Having a database that’s capable of high concurrency is one thing, but actually tapping all that potential concurrency is another. Fortunately, Optimizely Engineering has developed practical strategies that can help other teams.

Learn how Optimizely Engineering takes full advantage of the high concurrency that’s possible with their NoSQL database, ScyllaDB – while also guaranteeing correctness and protecting the quality of service. Brian Taylor, Principal Software Engineer, will offer a technical deep dive on:

- Understanding concurrency and its impact on throughput and latency
- Closed loop load testing, open loop load testing & the Universal Scaling Law
- The type of load testing you should be performing for capacity planning
- How to identify the region where your database can make the best use of concurrency
- Strategies for optimizing sound concurrency based on your data dependencies

Technology
Advertisement

Recommended

How Development Teams Cut Costs with ScyllaDB.pdf
ScyllaDB
71 views
62 slides
Learning Rust the Hard Way for a Production Kafka + ScyllaDB Pipeline
ScyllaDB
68 views
36 slides
NoSQL at Scale: Proven Practices & Pitfalls
ScyllaDB
177 views
11 slides
To Serverless and Beyond
ScyllaDB
360 views
61 slides
Optimizing ScyllaDB Performance via Observability
ScyllaDB
124 views
23 slides
Sink Your Teeth into Streaming at Any Scale
ScyllaDB
114 views
30 slides
Worldwide Local Latency With ScyllaDB
ScyllaDB
136 views
23 slides
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
ScyllaDB
2.1k views
36 slides
Advertisement

More Related Content

More from ScyllaDB (20)

Aggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDB
ScyllaDB
198 views
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
ScyllaDB
384 views
ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime
ScyllaDB
113 views
Automating the Hunt for Non-Obvious Sources of Latency Spreads
ScyllaDB
57 views
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
101 views
Integrating ScyllaDB with Quarkus
ScyllaDB
302 views
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
ScyllaDB
95 views
How Level Infinite Implemented CQRS and Event Sourcing on Top of Apache Pulsa...
ScyllaDB
214 views
Build Low Latency, Windowless Event Processing Pipelines with Quine and ScyllaDB
ScyllaDB
90 views
Building a 100% ScyllaDB Shard-Aware Application Using Rust
ScyllaDB
188 views
The Database Trends that are Transforming Your Database Infrastructure Forever
ScyllaDB
102 views
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
55 views
x86-less ScyllaDB: Exploring an All-ARM Cluster
ScyllaDB
176 views
ScyllaDB at Strava
ScyllaDB
197 views
Raft After ScyllaDB 5.2: Safe Topology Changes
ScyllaDB
122 views
Maximum Uptime Cluster Orchestration with Ansible
ScyllaDB
66 views
libSQL
ScyllaDB
145 views
Scalable and Resilient Security Ratings Platform with ScyllaDB
ScyllaDB
98 views
CI/CD for Data - Building Data Development Environment with lakeFS
ScyllaDB
69 views
The Path to ScyllaDB 5.2
ScyllaDB
146 views
Aggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDB
ScyllaDB
198 views
18 slides
Using ScyllaDB for Distribution of Game Assets in Unreal Engine
ScyllaDB
384 views
16 slides
ShareChat's Journey Migrating 100TB of Data to ScyllaDB with NO Downtime
ScyllaDB
113 views
17 slides
Automating the Hunt for Non-Obvious Sources of Latency Spreads
ScyllaDB
57 views
36 slides
Data Platform Architecture Principles and Evaluation Criteria
ScyllaDB
101 views
21 slides
Integrating ScyllaDB with Quarkus
ScyllaDB
302 views
13 slides

Recently uploaded (20)

it-observability-predictions-2023.pdf
ssuser8ba645
0 views
8 Best Product's For Your Business Growth
Cerebrum Infotech
0 views
Cyclomedia_Whitepaper_The_Future_of_Transport_and_Road_Safet.pdf
NestorSosa25
0 views
Unit_5 – Sequential Circuits.pptx
DrashtiPatel275412
0 views
Getting Started with TRISA
Benjamin Bengfort
0 views
Product Prototyping
NikunjGohil6
0 views
2023 avril Patch Tuesday
Ivanti
0 views
tutorial 2.docx
niraj313106
0 views
ch2_v1.ppt
YCreation
0 views
Web templates
DarleneAnnVillar
0 views
Cubesat Kit
KSF Space Cubesat
0 views
App development with stable diffusion model Unlocking the power of generative...
StephenAmell4
0 views
Develop Acala EVM+ with Chopsticks.pdf
Neven6
0 views
Algorithms-1.pdf
Sooraj Rajmohan
0 views
Seminar_1118.pptx
DrashtiPatel275412
0 views
DALLE-2.pptx
PIRSALMANSHAH
0 views
Neo4j Training Series - Spring Data Neo4j
Neo4j
0 views
Understanding Rendering In React
Syed Hassan Raza
0 views
growmaxinternational.pdf
growmaxinternational
0 views
Studi Kasus
Muhamadsoleh51
0 views
it-observability-predictions-2023.pdf
ssuser8ba645
0 views
28 slides
8 Best Product's For Your Business Growth
Cerebrum Infotech
0 views
9 slides
Cyclomedia_Whitepaper_The_Future_of_Transport_and_Road_Safet.pdf
NestorSosa25
0 views
17 slides
Unit_5 – Sequential Circuits.pptx
DrashtiPatel275412
0 views
57 slides
Getting Started with TRISA
Benjamin Bengfort
0 views
77 slides
Product Prototyping
NikunjGohil6
0 views
13 slides
Advertisement

How Optimizely (Safely) Maximizes Database Concurrency.pdf

  1. 1. How Optimizely (Safely) Maximizes Database Concurrency Brian Taylor, Principal Software Engineer at Optimizely Felipe Mendes, Solution Architect at ScyllaDB
  2. 2. Poll Where are you in your NoSQL adoption?
  3. 3. How Optimizely (Safely) Maximizes Database Concurrency Brian Taylor, Principal Software Engineer at Optimizely Felipe Mendes, Solution Architect at ScyllaDB
  4. 4. + For data-intensive applications that require high throughput and predictable low latencies + Close-to-the-metal design takes full advantage of modern infrastructure + >5x higher throughput + >20x lower latency + >75% TCO savings + Compatible with Apache Cassandra and Amazon DynamoDB + DBaaS/Cloud, Enterprise and Open Source solutions The Database for Gamechangers 4 “ScyllaDB stands apart...It’s the rare product that exceeds my expectations.” – Martin Heller, InfoWorld contributing editor and reviewer “For 99.9% of applications, ScyllaDB delivers all the power a customer will ever need, on workloads that other databases can’t touch – and at a fraction of the cost of an in-memory solution.” – Adrian Bridgewater, Forbes senior contributor
  5. 5. 5 +400 Gamechangers Leverage ScyllaDB Seamless experiences across content + devices Digital experiences at massive scale Corporate ﬂeet management Real-time analytics 2,000,000 SKU -commerce management Video recommendation management Threat intelligence service using JanusGraph Real time fraud detection across 6M transactions/day Uber scale, mission critical chat & messaging app Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Uniﬁed ML feature store across the business Cryptocurrency exchange app Geography-based recommendations Global operations- Avon, Body Shop + more Predictable performance for on sale surges GPS-based exercise tracking Serving dynamic live streams at scale Powering India's top social media platform Personalized advertising to players Distribution of game assets in Unreal Engine
  6. 6. Introductions Felipe Mendes, Solution Architect at ScyllaDB + Years of experience with Linux and other distributed systems + An open source enthusiast + Passion towards helping businesses to achieve their most challenging goals Brian Taylor, Principal Software Engineer at Optimizely + I am married with 3 young children + I have created 2 programming languages and 2 databases for legitimate business reasons + I love discovering some property in the solution space that, when maintained, makes everything simpler
  7. 7. Agenda ■ ScyllaDB Loves Concurrency • To a point • How to keep the wheels on the bus ■ Mommy, where does concurrency come from? • Easy way • Good way
  8. 8. ScyllaDB Loves Concurrency
  9. 9. Conceptual Model Network Execution S R Q Q Network Q Q Client Client
  10. 10. Why Does Concurrency Matter? S R Rtotal Reff = Rtotal /4 = (R + 3S) / 4 It lets us hide round trip latency!
  11. 11. Throughput - X - measured in things per second (what you probably care about) Concurrency - N - number of independent requests happening at the same time (the tool of this talk) Service Time - S - how long ScyllaDB says a thing took Request Time - R - how long something takes from the client’s perspective One at a time R = S + round trip time X = 1 / R (pure sequential) With concurrency N / R (in the linear region) X maxX (in the saturation region) 💩 (in the retrograde region) Deﬁnitions
  12. 12. User User Classic Closed Loop Load Testing SUT User Request Response ● Users can have up to one request in flight at any time ● Users must receive the response to their request before issuing the next request ● Not directly useful for modern capacity planning
  13. 13. Universal Scaling Law Generalization of Amdahl’s Law discovered by Dr. Neil Gunther. As number of users (N) increases, the system throughput (X) will: ● Enjoy a period of near linear scaling ● Eventually saturate some resource such that increasing N doesn’t increase X. This defines maxX ● Possibly encounter a coordination cost that drives down X with further increasing N Saturation Region Linear Region ScyllaDB-bench, 3x i3.large, average throughput vs concurrency Retrograde Region maxX
  14. 14. Modern Open Loop Load Testing Constant Throughput Source SUT Does not model users or think times. Instead models load as a constant throughput load source. Good match for capacity planning internet connected systems where we typically know requests per second but don’t really care how many users are behind that load. ● The start time of every request in the test is pre-ordained and does not depend on how the SUT is handling previous requests ● Concurrency is theoretically unbounded https://github.com/optimizely/scylla-bench-crate
  15. 15. Closed-loop testing: choose concurrency (N) Open-loop testing: choose throughput (X) ● X in the linear region will imply a bounded N (and R is very stable) ● X in the saturation region can have unbounded N and R is very chaotic ● When X exceeds saturation, N and R are undefined because the system will not keep up ● The USL is not a single valued function of X: This has interesting implications as X approaches saturation Relating the Testing Models
  16. 16. Network Execution S R Q Q Network Q Q 4 kops/s R99 = 1.487 ms 2 <= N <= 22 40 kops/s R99 = 2.527 ms 23 <= N <= 255 S99 = 0.634 ms Independent of kops/s Linear Region
  17. 17. Linear Region Throughput is directly proportional to concurrency ● The size of the cluster (in shards) and its aggregate SSD throughput will determine how large the linear region is ● You should engineering your system to take full advantage of the linear region SSD Throughput Allocation 4 kops/s 40 kops/s
  18. 18. Saturation Region Network Execution S R Q Q Network Q Q 100 kops/s R99 = 17.535 ms 295 <= N <= 2540 S99 is increasing with runtime
  19. 19. Saturation Region Throughput is approximately constant, regardless of concurrency ● At this point, assuming a well tuned cluster and workload, we are writing to disk as fast as the disks can take it ● Concurrency is no longer a mechanism to increase throughput and is now becoming a risk to stability ● You should engineer your system to stay out of the saturation region Saturation Region Linear Region Retrograde Region 100 kops/s
  20. 20. Network Execution Q Q Network Q Q Foreground Background Execution Q Compaction Debt SSD SSTable Merges SSTable Flushes, Commit Log Saturation Region SSD Throughput Allocation
  21. 21. Retrograde Region Network Execution S R Q Q Network Q Q 100 kops/s R99 = 55.8 s 75 <= N <= 4096 Once something bounces us into the retrograde region, S99 becomes 1000x worse than linear
  22. 22. Retrograde Region Increasing concurrency now decreases throughput. A system that enters this region is likely to get stuck here until demand declines ● The harder we push, the less throughput we get and the more demand builds which makes us want to push harder ● “Pushing harder” consumes more client resources (threads, futures, ScyllaDB driver state). The road to hell will terminate with an OOM unless there’s some other limiting factor Saturation Region Linear Region Retrograde Region 100 kops/s Road to hell
  23. 23. Stay in the linear region and you’ll enjoy consistent latencies and bounded concurrency. Stray into saturation and you’re likely to get stuck in the retrograde region until load subsides. ● Scale ScyllaDB such that you’re “always” going to be operating in the linear region for your expected load ● Design concurrency limiting mechanisms that keep you out of the retrograde region during unexpected spikes in load ● If you have work to do and can do it in the linear region: DO IT What Have We Learned?
  24. 24. Where Does Concurrency Come From?
  25. 25. These are the boring code wonk answers. ● Threads ○ Cheap end: 8kb per go-routine. Low thousands is reasonable ○ Expensive end: 1MB per java thread. Low hundreds is reasonable ● Reactors ○ Rust tokio, java futures, seastar, nodejs callbacks: typically <1kb / instance. Tens of thousands is reasonable ● Nodes ○ $$ limited Mechanisms of Concurrency
  26. 26. Data dependency is the mother of sound concurrency ● Easy: No dependency: logging facts at independent keys. Write only ● Medium: Partitionable dependency. As long as we process each independent streams sequentially, everything will be fine: maintain latest state at a single key ● Hard: Arbitrary “happens-before” relationships: add a relationship between two nodes in a graph Mother of Concurrency
  27. 27. ● Command: Represents an atomic unit of work. Contains IOPs. Always concludes with a write, may contain reads. ● IOP: IO Operation. A unit of work for the database. ● Batch: A group of IOPs that may be executed concurrently. Write IOPs within a batch may literally be combined into a batch operation to ScyllaDB. Batches execute sequentially with other batches ● Slot: A cubby for data. Has a name. Can be read or written (partition + clustering key in ScyllaDB) ● Concurrency strategy: How we group IOPs into batches such that the final slot state is consistent with commands having all been executed sequentially Deﬁnitions
  28. 28. No Dependency Write A Command 1 Command 2 Command 4 Command 5 Batch 1 Batch 2 Command 3 Write B Write C Write D Write E When commands contain no reads and always write to different slots there can be no data dependency. The decision about when to switch from batch 1 to 2 can be arbitrary, or driven by a desire to minimize latency, or to work within the ScyllaDB batch size constraint
  29. 29. For a read/modify/write (RMW) operation to yield correct results, reads must be able to observe the writes of prior RMWs. Most streaming platforms (storm, flink, kafka-streams, spark) trivially solve this by partitioning commands into guaranteed independent streams. This means that: ● Every command has a happens-before relationship with every following command for the partition key ● Cross command concurrency is impossible within a partition ● Concurrency is limited by the cardinality of the partition key Read/Modify/Write and “Happens Before”
  30. 30. Command 1 / Tenant B Partitioned Concurrency Read A Batch 1 Batch 2 Command 2 / Tenant A Write B Read C The work for each tenant is executed strictly in the order it was received. This guarantees that reads will always see prior writes but misses opportunities for greater concurrency by being ignorant of non-interacting slot usage Batch 3 Batch 4 Write B Read D Write D Command 1 / Tenant A Happens before Happens before
  31. 31. ● Reads of a slot must be able to observe any writes to that slot that came before them. ○ Writes create a happens-before relationship with any reads that follow them ● The final value in a slot must reflect last write in the sequence ○ Writes create a happens-before relationship with any writes that follow them Golden Rules of Data Dependency
  32. 32. Command 1 / Tenant B Golden Rule Happens Before Read A Batch 1 Batch 2 Command 2 / Tenant A Write B Read C By examining slot usage and applying the golden rules we can eliminate a batch and get concurrency within a partition Batch 3 Write B Read D Write D Command 1 / Tenant A Happens before
  33. 33. Each of the golden rules implies a simplification rule that we can use to further compress command execution ● Reads of a slot must be able to observe any writes to that slot that came before them. ○ Reads do not have to observe prior writes by literally reading the database ● The final value in a slot must reflect last write in the sequence ○ If a prior write is not read from the database, it can be omitted as long as the final write happens Simpliﬁcation Rules of Data Dependency
  34. 34. Command 1 / Tenant B Data Dependency Simpliﬁcation Read A Batch 1 Batch 2 Command 2 / Tenant A Write B Read C Reads can directly read prior writes. Overwritten writes can be skipped. Write B Read D Write D Command 1 / Tenant A Happens before
  35. 35. Data dependency is the mother of sound concurrency. If you can find enough sound concurrency in your problem then you can exploit the full linear region of ScyllaDB’s awesome concurrency. ● Case 1: 64 cores, thread and partition concurrency. Maximum throughput about 8 kcommands/s. Very sensitive to “data-shape” aka partition cardinality. ● Case 2: 15 cores, reactor and happens-before concurrency. Maximum throughput about 30 kcommands/s. Insensitive to most practical “data-shape” issues. What’s the Point?
  36. 36. Q&A ScyllaDB University Free online learning scylladb.com/university ScyllaDB Summit Access 30+ sessions instantly scylladb.com/summit ScyllaDB Cloud Start free Trial scylladb.com/cloud
  37. 37. @scylladb scylladb/ slack.scylladb.com @scylladb company/scylladb/ scylladb/ Thank you for joining us today. medium.com/engineers-optimizely

×