AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millisecond Latencies

Glauber Costa
Principal Architect, ScyllaDB
SCYLLA

+ Tweet pictures of you and your plushie in some known landmark
+ Make sure to mention @ScyllaDB !
Want to win a ScyllaDB T-shirt?

+ What’s ScyllaDB; Why ScyllaDB
+ How ScyllaDB helps AdGear win.
+ What’s under the hood, that allows that to happen
Today we will cover:

No clear winner in NoSQL
Challenges:
• Cost
• Lock-in
Challenges:
• Scale
• Multi DC
• Latency
Challenges:
• Not persistent
• Manageability
Challenges:
• Price/performance
• Complexity
• JVM..

What we do: Scylla, towards the best NoSQL
Cassandra

What we do: Scylla, towards the best NoSQL
+ > 1 million OPS per node
+ < 1ms 99% latency
+ Auto tuned
+ Scale up and out

Cassandra Scylla
Throughput: Cannot utilize multi-core efficiently Scales linearly - shard-per-core
Latency: High due to Java and JVM’s GC Low and consistent - own cache
Complexity: Intricate tuning and configuration Auto tuned, dynamic scheduling
Admin: Maintenance impacts performance SLA guarantee for admin vs serving

Scylla Scales UP and OUT
Ingestion time. Every point doubles node size and data per node.
Total data size per node in the i3.16xlarge case is 4.8TB.
1B rows 2B rows 4B rows 8B rows 16B rows
time to ingest

Scylla Scales UP and OUT
nodetool compact from quiescent state. Each point doubles node size and data per node
4.8TB i3.16xlarge: 2:11:34
4.8TB2.4TB1.2TB0.6TB0.3TB
Time to fully compact the node

“Nodes must be small in case they fail”
11
+ No, they don’t.
+ Same clusters as previous experiments.
+ Destroy compacted node, rebuild from remaining two.
1B rows 2B rows 4B rows 8B rows 16B rows
4.8TB2.4TB1.2TB0.6TB0.3TB

About AdGear Samsung Ads
1. AdTech (Advertising Technology) space
2. Started ~10 years ago here in Montreal
▪ Classical Publisher and Advertiser use cases
▪ “Big Data” 250-5k ad impressions / second
3. Then added RTB (Real-Time-Bidding) functionality
▪ Classical buyer/seller use cases
▪ “Big Data” 1M+ transactions / second
4. Then acquired by Samsung VD (Visual Display) while forming
Samsung Ads
▪ Classical hardware manufacturer
▪ Unique “Big Data” and opportunities

RTB: Value in execution based on data
asymmetry
bob: previously purchased a $4k bike
bob: habitually watches cycling races
bob: is male
bob: db timeout

Requirements for that database:
1. Key-value(s) store
2. Low-latency reads. Single milliseconds or less
3. High-throughput to keep up with the rest of the stack volume
4. Horizontal scalability
5. Multi-DC by design
6. Behaves well under mixed concurrent loads:
a. Point Reads X Point Writes X Bulk Writes

Apache Cassandra at AdGear
1. Used Cassandra since 2010 (v0.6) on sun-jdk (1.6)
a. Those were the days of many operational “WTFs” and gnashing of
teeth
i. Fun fact! That JVM enters 100% CPU usage on leap second adjustments!
b. But it worked fairly well all things considered
2. Cassandra matured as our company matured:
a. Now with VTokens like described in the Dynamo Paper. Yay!
b. Now with LevelDB-like compaction strategy. Yay!
c. Now with off-heap low-GC-cost data structures. Yay!
d. Now with G1Gc on by default. Yay!
e. Now with forked community vs enterprise roadmap.. Yay?

2017 Tipping Point
Cassandra:
• Slowly losing the latency battle
• Node proliferation
• Load-induced deep JVM bugs
beyond our capacity to debug ->
instability
• Not particularly interested in
enterprise-packaged version of
the above
What to do:
• What are modern alternatives ?
• Have you guys heard of ScyllaDB
? Seen them pop up a few times
• Willing to help POC with great
engineering guidance!
• Marketed as:
▪ service cassandra stop
▪ service scylladb start

2017 Scylla DB at AdGear
Cassandra Scylla
Servers 31 16
Read latency ~21ms <5ms
Backlog and timeouts As high as 15% at peak
☹
~0

2017 Scylla DB at AdGear: POC metrics

2018 Scylla DB at AdGear: In Production

Threads Shards
Two-level sharding - shard per core

Seastar, Scylla’s engine: “All things async”

Close to the hardware
• Our own memory allocator
• Our own Disk I/O Scheduler
• Our own CPU Scheduler
• Our own cache, bypasses Linux entirely.
27

The Autonomous NoSQL Database
28
• SLA for Requests over maintenance operations
• Automatic tuning
• Automatic backpressure
• Scale up/down easily and stream as fast as possible
• Ongoing repair
• Smoothes complex data models

Throughput is EASY
29
• Maybe costly, but easy
• Bruce Wayne can get any throughput he wants from any modern
NoSQL, including Cassandra.

Throughput is EASY
30
• Maybe costly, but easy
• Bruce Wayne can get any throughput he wants from any modern
NoSQL, including Cassandra.
LATENCY IS HARD

Dear Scylla,
31
What do you call a latency distribution for which the high percentiles
are much higher than the average?

Dear Scylla,
32
What do you call a latency distribution for which the high percentiles
are much higher than the average?

Three main sources of latencies - Act 1
(Speed mismatch)
33

How fast is my system?
▪ There are two speeds:
o Disk Speed
o CPU/memory speed
▪ What happens when they are not in sync ?
latency mean : 51.9
latency median : 9.8
latency 95th percentile : 125.6
latency 99.9th percentile : 1991.2
34

How fast is my system?
▪ There are two speeds:
o Disk Speed
o CPU/memory speed
▪ What happens when they are not in sync ?
latency mean : 51.9
latency 99th percentile : 1184.0 (x 22)
latency 99.9th percentile : 1991.2 (x 38)
35

The Wall - where is it relevant?
▪ Disk speed slower than CPU speed
o plain slow disk, large payloads
36

The Wall - where is it relevant?
▪ Disk speed slower than CPU speed
o plain slow disk, large payloads
▪ Any other mismatch between resources
o For example, large memory capped by narrow network
37

The Wall - Results
39
latency mean : 54.9
latency 99.9th percentile : 364.6

The Wall - Results
40
latency mean : 54.9
latency 99th percentile : 253.9 (x 4.6)
latency 99.9th percentile : 364.6 (x 6.6)

(Lack of respect for limits)
41

Tasks in Scylla
42
Traditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes

The task quota
▪ How often do we check the work queues?
▪ Pre-2.0 defaults too high for latency bound systems
▪ Tasks not respecting it will cause spikes
43

The task quota
▪ How often do we check the work queues?
▪ Pre-2.0 defaults too high for latency bound systems
▪ Tasks not respecting it will cause spikes
44

(Imperfect Isolation)
45

The I/O Scheduler
46
Query
Commitlog
Compaction
Queue
Queue
Queue
Userspace
I/O
Scheduler
Disk
Max useful disk concurrency
I/O queued in FS/deviceNo queues

The I/O Scheduler
47
• Major component of Scylla since early versions
▪ Central component in The Wall
▪ Getting major improvements for latency workloads in Scylla 2.3

The CPU Scheduler
48
• Since Scylla 2.0, initial version
▪ disabled by default, AdGear enables it.
▪ enabled in our AWS AMI if using i3 instances.
• 2.2 ships with the full solution
▪ Ships this week!
▪ Enabled by default everywhere.
▪ Much better isolation

Memtable
Seastar
Scheduler
Compaction
Query
Repair
Commitlog
SSD
Compaction
Backlog
controller
Memory
controller
Adjust priority
Adjust priority
WAN
CPU
The Autonomous Database
49

The controllers - memtable
54
latency mean : 0.6
latency mean : 0.4

The controllers - compactions
55
% CPU time used by Compactions
Throughput

56
workload changes:
- automatic adjustment
- new equilibrium

57
2ms : 99.9 % latencies at 100 % load
< 2ms : 99 % latencies,
1ms : 95 % latencies.

The controllers - coming soon
58
• Scylla 2.2: SizeTiered compactions are controlled.
• Scylla 2.3: All compaction strategies are controlled.
• Repairs
▪ Repairs already respect latencies very well, but are not as fast as
they could be. Controllers will help unleash their full potential
▪ Done: Scylla Enterprise Manager schedules repairs automatically, no
human involvement needed

Summary
59
• Scylla inherits the user-visible architecture from Cassandra, a
solution that is known to scale up very well
• Scylla employs a radically different internal architecture, allowing
it to scale up as well as out while keeping latencies predictable
• Scylla reduces TCO across the board, by also minimizing
operational expenses.

Thanks You!
Resources
slideshare.net/ScyllaDB
glauber@scylladb.com (@glcst)
@scylladb
http://bit.ly/2oHAfok
youtube.com/c/scylladbscylladb.com/blog

AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millisecond Latencies

More Related Content

What's hot

Similar to AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millisecond Latencies

More from ScyllaDB

Recently uploaded

AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millisecond Latencies