Linear scaling (sometimes near linear scaling) is often mentioned in several benchmarks, articles and product comparisons as proof that a given technology and algorithmic optimizations perform better than another. But is that really what performance is all about, and should you even care?
This webinar discusses performance beyond linear scalability, including what typically matters more when running high throughput and low latency workloads at scale. We'll cover how ScyllaDB offers unparalleled performance and share our insights on:
- The hidden aspects of linear scaling
- When linear scaling matters most and when it’s simply irrelevant
- Often overlooked considerations for optimizing and measuring distributed systems performance
Watch now to learn from our experience (and lessons learned) in building the fastest NoSQL database in the world.
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
1. Felipe Mendes, Solution Architect at ScyllaDB
Beyond Linear Scaling
A New Path for Performance
with ScyllaDB
2. + For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
2
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
3. 3
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
4. Introductions
Felipe Mendes, Solution Architect at ScyllaDB
+ Published Author on Linux and Databases
+ Helps teams solve their most challenging problems
+ Years of experience with Linux and distributed systems
5. Agenda
+ (Near) Linear Scaling
+ Enter Real-life
+ ScyllaDB under Load
+ Crafting Your Success
+ Beyond Linear Scaling
7. 7
Linear Speedup
Main goal is to run programs faster
+ To a point…
+ Measured as
+ Reasons for sub-linear speedup:
+ Laws! (Amdahl's, Gustafson-Barsis)
+ Task Management
+ Communication & Synchronization
15.2 Performance in Practice
Ideal, typical, and super-linear speedup curves
8. Universal Scaling Law
Generalization of Amdahl’s Law discovered by Dr. Neil
Gunther. As number of users (N) increases, the
system throughput (X) will:
+ Enjoy a period of near linear scaling
+ Eventually saturate some resource such that
increasing N doesn’t increase X. This defines
maxX
+ Possibly encounter a coordination cost that
drives down X with further increasing N
Saturation
Region
Linear
Region
Retrograde
Region
maxX
How Optimizely (Safely) Maximizes Database Concurrency
9. Linear Scaling – Good
Relevant for parallel programming, useful for measuring:
+ Database efficiency
+ Price-performance
+ Scalability
NoSQL Benchmark: MongoDB vs ScyllaDB
9
10. Doesn't account for:
+ Improvements Over Time
+ Application Semantics
+ Hotspots
+ Scaling Clients
+ Consistent Hashing Uneven Distribution
+ Communication Overhead
More on propagating state (and image credits): Gnutella: an Intro to Gossip
Linear Scaling – Bad
Gossip propagation
10
13. 13
Consistent Hashing
Exercise: How much more traffic and
load does this node receive?
Alexys Jacob – Leveraging consistent hashing in your python applications
thelastpickle – The Impacts of Changing the Number of VNodes in Apache Cassandra
Avi Kivity's shard simulator
Bad
Better, but not perfect
15. How Discord Stores Trillions of Messages
Performance Under Load – Adaptive Concurrency Limits
Challenges:
+ For a system serving X static clients, what's the max
effective concurrency to set on a single client?
+ When scaling clients, how to coordinate them to
avoid overwhelming a group of replicas?
Scaling Clients
15
Discord consistent hash-based routing
DB
Calls
Netflix Adaptive Concurrency
23. 23
Run Real Tests
Benchmark tools prove you can get there, but:
+ Application semantics are unique
+ Access patterns are unique
+ Real-life tooling is also unique
+ Addressing all corner-cases is time-consuming or even impossible
+ Don't just blindly assume 2x will give you 2x load
24. 24
Eliminate Noise
Avoid large deployments of small nodes
+ Go Big or Go Home!
+ Considerably reduces the overhead associated with
communication & synchronization
+ Less resource overcommitment
+ BUT, keep balance:
+ Account for inevitable failures
+ Leave room for unpredictability
25. 25
Tune the client side
Understand your data flows:
+ Can multiple clients spam a single key?
+ What happens when scaling the number of
clients?
+ How is load balancing achieved?
Power of Two Choices load balancing
P99 CONF – Conquering Load Balancing: Experiences from ScyllaDB Drivers
33. There's much more to performance beyond Linear Scale:
+ Goods and Bads of Linear Scaling
+ Real-life situations impacting linear scalability
+ ScyllaDB Shard-Per-Core Architecture
+ Run Realistic Workloads
+ How ScyllaDB drives the meaning of 'performance'
33
Summary
34. Q&A
ScyllaDB Cloud
Start free trial
scylladb.com/cloud
Feb 14-15 | VIRTUAL EVENT
scylladb.com/summit
Virtual Workshop
January 25, 2024
scylladb.com/events
35. Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/