SlideShare a Scribd company logo
ScyllaDB Architecture -
built for speed
Tzach Livyatan, VP Product
Tzach Livyatan
■ VP Product ScyllaDB
■ Love Databases and NoSQL
■ <cool hobby>
Your photo
goes here,
smile :)
■ High Availability
■ Data Modeling
■ Implementation
Presentation Agenda
Vestibulum congue
Distributed
Node
HW
Control
High Availability
NoSQL – By Data Model
Key / Value Redis, Aerospike, RocksDB
Document store MongoDB, Couchbase
Wide column store Scylla, Apache Cassandra,
HBase, DynamoDB
Graph Neo4j, JanusGraph
Complexity
5
NoSQL– By Availability vs Consistency
6
Pick Two
Availability
Partition Tolerance
Consistency
PACELC:
Latency vs Consistency
Cluster - Node Ring
7
Node 5
Node 1
Node 2
Node 4 Node 3
Data Replication
■ Replication Factor: number of nodes where data (rows and partitions) are replicated
■ Done automatically
■ Set for keyspace
CREATE KEYSPACE mykeyspace WITH replication = {
'class': 'NetworkTopologyStrategy',
'replication_factor' : 3}
AND durable_writes = true;
8
Replication Factor (RF) = 3
9
Node 5
Node 1
Node 2
Node 4 Node 3
Multiple Data Centers
10
USA DC
Asia DC
EU DC
'us_1' : 3,
'eu' : 3,
'asia' : 3
Consistency Level
■ CL: # of nodes that must acknowledge read/write
■ I.E.: 1, QUORUM, LOCAL_QUORUM, ALL
■ Tunable Consistency: CL set per operation
11
12
Cluster Level Write
Cluster Level Read
13
CL =1
Datacenters
14
USA DC
Asia DC
LOCAL
QUORUM
15
Scylla Architecture
Data Modeling
CQL Example
Query:
SELECT * from heartrate_v10 WHERE
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 LIMIT 1;
SELECT * from heartrate_v10 WHERE
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 AND
time >= '2021-05-01 01:00+0000' AND
time < '2021-05-01 01:03+0000';
17
https://gist.github.com/tzach/7486f1a0cc904c52f4514f20f14d2a97
Wide Partition Example
CREATE TABLE heartrate_v10 (
pet_chip_id uuid,
owner uuid,
time timestamp,
heart_rate int,
PRIMARY KEY (pet_chip_id, time)
);
pet_chip_id time heart_rate
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:00:00.000000+0000 120
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:01:00.000000+0000 121
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:02:00.000000+0000 120
Partition Key Clustering Key
18
Architecture
pet_chip_id time heart_rate
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:00:00.000000+0000 120
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:01:00.000000+0000 121
80d39c78-9dc0-11eb-a8b3-
0242ac130003 2021-05-01 01:02:00.000000+0000 120
Partitioner
Hash Function
Partition Key
Token Range
20
Wide Partition Example
Advance Data Modeling
Materialized Views (MV)
Secondary Index (SI)
Change Data Capture (CDC)
Collections
User Defined Types
Time To Live (TTL)
…
1. INSERT INTO heartrate
(pet_chip_id,
Owner,
Time,
heart_rate)
VALUES (..);
2. INSERT INTO
heartrate
Base replica
View replica
Coordinator
3. INSERT INTO
heartrate_by_owner
View is another table
22
View is another table
2.
SELECT * FROM
heartrate_by_owner
WHERE owner = ‘642a..’;
Base replica
View replica
Coordinator
1.
SELECT * FROM
heartrate_by_owner
WHERE owner = ‘642a..’;
23
Global Sec Index - Different Partition key
2.
SELECT name
FROM pet_by_owner_index
WHERE owner = '642a..';
3.
SELECT *
FROM heartrate_10
WHERE pet_chip_id in (...)
AND time in (...)
Base replica
View replica
Coordinator
1.
SELECT * FROM heartrate_v10
WHERE owner = ‘642a..’;
Write Path - Replica
25
26
Read Path - Replica
SSTables
Cache
Memory
Disk
1
2
3
4
5
…
Bloom
Filter
2.5
Storage - Log-Structured Merge Tree
SStable 1
Time
Storage - Log-Structured Merge Tree
SStable 1
Time
SStable 2
SStable 1
SStable 2
SStable 3
Time
SStable 4
SStable 1+2+3
Storage - Log-Structured Merge Tree
Implementation
ScyllaDB Design Decisions
C++ instead of Java
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
ScyllaDB Design Decisions
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
ScyllaDB Design Decisions
Shards
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
Small, Medium, Large Machines
Why larger nodes?
■ Time between failures is
shorter
■ Ease of maintenance
■ No noisy neighbours
■ No virtualization, container
overhead
■ No other moving parts
■ Scale up before out!
Linear Scale Ingestion
Constant Time while volume & throughput double
2X 2X 2X 2X 2X
Network Comparison
Kernel
Cassandra
TCP/IP
Scheduler
queue
queue
queue
queue
queue
threads
NIC
Queues
Kernel
Traditional Stack SeaStar’s Sharded Stack
Memory
Application
TCP/I
P
Task Scheduler
queue
queue
queue
queue
queue
smp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queue
queue
queue
queue
queue
smp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/I
P
Task Scheduler
queue
queue
queue
queue
queue
smp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Core
Database
Task Scheduler
queue
queue
queue
queue
queue
smp queue
NIC
Queue
Userspace
ScyllaDB Has Its Own Task Scheduler
Traditional Stack Scylla’s Stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
ScyllaDB Design Decisions
Cassandra Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Complex
Tuning
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
ScyllaDB Design Decisions
Cassandra
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
App
thread
Kernel
SSD
Page fault
Suspend thread
Initiate I/O
Context switch
I/O
completes
Interrupt
Context
switch
Map page
Resume
thread
ScyllaDB Design Decisions
Query
Commitlog
Compaction
Userspace
I/O
Scheduler
Disk
Max useful disk concurrency
Queue
Queue
Queue
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
ScyllaDB Design Decisions
Memtable
Seastar
Scheduler
Compaction
Query
Repair
Commitlog
SSD
Compaction
Backlog Monitor
Memory Monitor
Adjust priority
Adjust priority
WAN
CPU
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
Different types of loads
■ OLTP
● Small work items
● Latency sensitive
● involves narrow
portion of the data
■ OLAP
● Large work items
● Throughput oriented
● Performed on large
amounts of data
Workload Prioritization
Load #3
800 shares
Load #2
400 shares
Load #1
200 shares
Scylla Design Decisions
1
2 All Things Async
3 Shard per Core
4 Unified Cache
5 I/O Scheduler
6 Autonomous
C++
More than 1M req/sec on i4i.8xlarge
https://github.com/scylladb/1m-ops-demo by Attila Tóth
46
■ Built for High Availability
■ Design to meet modern hardware
■ Use a fully async, share nothing, shard per core architecture
■ Superior throughput and consistent low latency
■ Expose internal scheduler to the user as Workload Prioritization
Summary
Scylla vs Competition
■ 1/7th the cost
■ 26x better in a real life
scenario
■ 10x volume
■ 9.3x throughput
■ 1/4x latency
■ 4 Scylla nodes vs 40
Cassandra
■ 2.5X cheaper
■ 11x better latency
■ 1/5th cost in a benchmark
■ 20x better real-life scenario
■ No throttling
■ No locking
CockroachDB
Google’s
Bigtable
DynamoDB Cassandra
logscaled
Stay in Touch
Tzach Livyatan
tzach@scylladb.com
https://twitter.com/TzachL
https://github.com/tzach
https://www.linkedin.com/in/tzach/

More Related Content

Similar to A Deep Dive into ScyllaDB's Architecture

Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
ScyllaDB
 
5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes
Cloudera, Inc.
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
MongoDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
J On The Beach
 
To Serverless and Beyond
To Serverless and BeyondTo Serverless and Beyond
To Serverless and Beyond
ScyllaDB
 
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Aaron Benton
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a Nutshell
Emily Ikuta
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
Patrick Quairoli
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
Ben Stopford
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
Kristofferson A
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
Amazon Web Services
 
Presentation
PresentationPresentation
Presentation
Dimitris Stripelis
 
CFCamp 2016 - Couchbase Overview
CFCamp 2016 - Couchbase OverviewCFCamp 2016 - Couchbase Overview
CFCamp 2016 - Couchbase Overview
Aaron Benton
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
ScyllaDB
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Severalnines
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
ScyllaDB
 

Similar to A Deep Dive into ScyllaDB's Architecture (20)

Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes5 Apache Spark Tips in 5 Minutes
5 Apache Spark Tips in 5 Minutes
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
To Serverless and Beyond
To Serverless and BeyondTo Serverless and Beyond
To Serverless and Beyond
 
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
Couchbase Overview - Monterey Bay Information Technologists Meetup 02.15.17
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a Nutshell
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
Presentation
PresentationPresentation
Presentation
 
CFCamp 2016 - Couchbase Overview
CFCamp 2016 - Couchbase OverviewCFCamp 2016 - Couchbase Overview
CFCamp 2016 - Couchbase Overview
 
How Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdfHow Development Teams Cut Costs with ScyllaDB.pdf
How Development Teams Cut Costs with ScyllaDB.pdf
 
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
 

More from ScyllaDB

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
ScyllaDB
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
ScyllaDB
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
ScyllaDB
 
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
ScyllaDB
 
Noise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, AkamaiNoise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, Akamai
ScyllaDB
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
ScyllaDB
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
ScyllaDB
 
Reducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGCReducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGC
ScyllaDB
 
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
ScyllaDB
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
Conquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB DriversConquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB Drivers
ScyllaDB
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
ScyllaDB
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz
ScyllaDB
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
ScyllaDB
 
Making Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of RustMaking Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of Rust
ScyllaDB
 
A Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus AlbuquerqueA Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus Albuquerque
ScyllaDB
 
The Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of LatencyThe Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of Latency
ScyllaDB
 

More from ScyllaDB (20)

Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
Unconventional Methods to Identify Bottlenecks in Low-Latency and High-Throug...
 
Mitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing SystemsMitigating the Impact of State Management in Cloud Stream Processing Systems
Mitigating the Impact of State Management in Cloud Stream Processing Systems
 
Measuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at TwitterMeasuring the Impact of Network Latency at Twitter
Measuring the Impact of Network Latency at Twitter
 
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
Architecting a High-Performance (Open Source) Distributed Message Queuing Sys...
 
Noise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, AkamaiNoise Canceling RUM by Tim Vereecke, Akamai
Noise Canceling RUM by Tim Vereecke, Akamai
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance TroublesUsing Libtracecmd to Analyze Your Latency and Performance Troubles
Using Libtracecmd to Analyze Your Latency and Performance Troubles
 
Reducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGCReducing P99 Latencies with Generational ZGC
Reducing P99 Latencies with Generational ZGC
 
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
5 Hours to 7.7 Seconds: How Database Tricks Sped up Rust Linting Over 2000X
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
Conquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB DriversConquering Load Balancing: Experiences from ScyllaDB Drivers
Conquering Load Balancing: Experiences from ScyllaDB Drivers
 
Interaction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance MetricInteraction Latency: Square's User-Centric Mobile Performance Metric
Interaction Latency: Square's User-Centric Mobile Performance Metric
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz99.99% of Your Traces are Trash by Paige Cruz
99.99% of Your Traces are Trash by Paige Cruz
 
Square's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with RaftSquare's Lessons Learned from Implementing a Key-Value Store with Raft
Square's Lessons Learned from Implementing a Key-Value Store with Raft
 
Making Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of RustMaking Python 100x Faster with Less Than 100 Lines of Rust
Making Python 100x Faster with Less Than 100 Lines of Rust
 
A Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus AlbuquerqueA Deep Dive Into Concurrent React by Matheus Albuquerque
A Deep Dive Into Concurrent React by Matheus Albuquerque
 
The Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of LatencyThe Latency Stack: Discovering Surprising Sources of Latency
The Latency Stack: Discovering Surprising Sources of Latency
 

Recently uploaded

Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
Andrey Yasko
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
Zilliz
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
Safe Software
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAIApplying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
ssuserd4e0d2
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
The Evolution of Remote Server Management
The Evolution of Remote Server ManagementThe Evolution of Remote Server Management
The Evolution of Remote Server Management
Bert Blevins
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
KAMAL CHOUDHARY
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
Edge AI and Vision Alliance
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
Lidia A.
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 

Recently uploaded (20)

Comparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdfComparison Table of DiskWarrior Alternatives.pdf
Comparison Table of DiskWarrior Alternatives.pdf
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
Using LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and MilvusUsing LLM Agents with Llama 3, LangGraph and Milvus
Using LLM Agents with Llama 3, LangGraph and Milvus
 
Data Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining DataData Integration Basics: Merging & Joining Data
Data Integration Basics: Merging & Joining Data
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAIApplying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
Applying Retrieval-Augmented Generation (RAG) to Combat Hallucinations in GenAI
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
The Evolution of Remote Server Management
The Evolution of Remote Server ManagementThe Evolution of Remote Server Management
The Evolution of Remote Server Management
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
Recent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS InfrastructureRecent Advancements in the NIST-JARVIS Infrastructure
Recent Advancements in the NIST-JARVIS Infrastructure
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
“Deploying Large Language Models on a Raspberry Pi,” a Presentation from Usef...
 
WPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide DeckWPRiders Company Presentation Slide Deck
WPRiders Company Presentation Slide Deck
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 

A Deep Dive into ScyllaDB's Architecture

  • 1. ScyllaDB Architecture - built for speed Tzach Livyatan, VP Product
  • 2. Tzach Livyatan ■ VP Product ScyllaDB ■ Love Databases and NoSQL ■ <cool hobby> Your photo goes here, smile :)
  • 3. ■ High Availability ■ Data Modeling ■ Implementation Presentation Agenda Vestibulum congue Distributed Node HW Control
  • 5. NoSQL – By Data Model Key / Value Redis, Aerospike, RocksDB Document store MongoDB, Couchbase Wide column store Scylla, Apache Cassandra, HBase, DynamoDB Graph Neo4j, JanusGraph Complexity 5
  • 6. NoSQL– By Availability vs Consistency 6 Pick Two Availability Partition Tolerance Consistency PACELC: Latency vs Consistency
  • 7. Cluster - Node Ring 7 Node 5 Node 1 Node 2 Node 4 Node 3
  • 8. Data Replication ■ Replication Factor: number of nodes where data (rows and partitions) are replicated ■ Done automatically ■ Set for keyspace CREATE KEYSPACE mykeyspace WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor' : 3} AND durable_writes = true; 8
  • 9. Replication Factor (RF) = 3 9 Node 5 Node 1 Node 2 Node 4 Node 3
  • 10. Multiple Data Centers 10 USA DC Asia DC EU DC 'us_1' : 3, 'eu' : 3, 'asia' : 3
  • 11. Consistency Level ■ CL: # of nodes that must acknowledge read/write ■ I.E.: 1, QUORUM, LOCAL_QUORUM, ALL ■ Tunable Consistency: CL set per operation 11
  • 17. CQL Example Query: SELECT * from heartrate_v10 WHERE pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 LIMIT 1; SELECT * from heartrate_v10 WHERE pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 AND time >= '2021-05-01 01:00+0000' AND time < '2021-05-01 01:03+0000'; 17 https://gist.github.com/tzach/7486f1a0cc904c52f4514f20f14d2a97
  • 18. Wide Partition Example CREATE TABLE heartrate_v10 ( pet_chip_id uuid, owner uuid, time timestamp, heart_rate int, PRIMARY KEY (pet_chip_id, time) ); pet_chip_id time heart_rate 80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:00:00.000000+0000 120 80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:01:00.000000+0000 121 80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:02:00.000000+0000 120 Partition Key Clustering Key 18
  • 19. Architecture pet_chip_id time heart_rate 80d39c78-9dc0-11eb-a8b3- 0242ac130003 2021-05-01 01:00:00.000000+0000 120 80d39c78-9dc0-11eb-a8b3- 0242ac130003 2021-05-01 01:01:00.000000+0000 121 80d39c78-9dc0-11eb-a8b3- 0242ac130003 2021-05-01 01:02:00.000000+0000 120 Partitioner Hash Function Partition Key Token Range
  • 21. Advance Data Modeling Materialized Views (MV) Secondary Index (SI) Change Data Capture (CDC) Collections User Defined Types Time To Live (TTL) …
  • 22. 1. INSERT INTO heartrate (pet_chip_id, Owner, Time, heart_rate) VALUES (..); 2. INSERT INTO heartrate Base replica View replica Coordinator 3. INSERT INTO heartrate_by_owner View is another table 22
  • 23. View is another table 2. SELECT * FROM heartrate_by_owner WHERE owner = ‘642a..’; Base replica View replica Coordinator 1. SELECT * FROM heartrate_by_owner WHERE owner = ‘642a..’; 23
  • 24. Global Sec Index - Different Partition key 2. SELECT name FROM pet_by_owner_index WHERE owner = '642a..'; 3. SELECT * FROM heartrate_10 WHERE pet_chip_id in (...) AND time in (...) Base replica View replica Coordinator 1. SELECT * FROM heartrate_v10 WHERE owner = ‘642a..’;
  • 25. Write Path - Replica 25
  • 26. 26 Read Path - Replica SSTables Cache Memory Disk 1 2 3 4 5 … Bloom Filter 2.5
  • 27. Storage - Log-Structured Merge Tree SStable 1 Time
  • 28. Storage - Log-Structured Merge Tree SStable 1 Time SStable 2
  • 29. SStable 1 SStable 2 SStable 3 Time SStable 4 SStable 1+2+3 Storage - Log-Structured Merge Tree
  • 31. ScyllaDB Design Decisions C++ instead of Java 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++
  • 32. ScyllaDB Design Decisions 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++
  • 33. ScyllaDB Design Decisions Shards 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++
  • 34. Small, Medium, Large Machines Why larger nodes? ■ Time between failures is shorter ■ Ease of maintenance ■ No noisy neighbours ■ No virtualization, container overhead ■ No other moving parts ■ Scale up before out!
  • 35. Linear Scale Ingestion Constant Time while volume & throughput double 2X 2X 2X 2X 2X
  • 36. Network Comparison Kernel Cassandra TCP/IP Scheduler queue queue queue queue queue threads NIC Queues Kernel Traditional Stack SeaStar’s Sharded Stack Memory Application TCP/I P Task Scheduler queue queue queue queue queue smp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queue queue queue queue queue smp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Application TCP/I P Task Scheduler queue queue queue queue queue smp queue NIC Queue DPDK Kernel (isn’t involved) Userspace Core Database Task Scheduler queue queue queue queue queue smp queue NIC Queue Userspace
  • 37. ScyllaDB Has Its Own Task Scheduler Traditional Stack Scylla’s Stack Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise Task Promise Task Promise Task Promise Task CPU Promise is a pointer to eventually computed value Task is a pointer to a lambda function Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Scheduler CPU Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread Stack Thread is a function pointer Stack is a byte array from 64k to megabytes
  • 38. ScyllaDB Design Decisions Cassandra Scylla Key cache Row cache On-heap / Off-heap Linux page cache SSTables Unified cache SSTables Complex Tuning 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++
  • 39. ScyllaDB Design Decisions Cassandra Key cache Row cache On-heap / Off-heap Linux page cache SSTables 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++ App thread Kernel SSD Page fault Suspend thread Initiate I/O Context switch I/O completes Interrupt Context switch Map page Resume thread
  • 40. ScyllaDB Design Decisions Query Commitlog Compaction Userspace I/O Scheduler Disk Max useful disk concurrency Queue Queue Queue 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++
  • 41. ScyllaDB Design Decisions Memtable Seastar Scheduler Compaction Query Repair Commitlog SSD Compaction Backlog Monitor Memory Monitor Adjust priority Adjust priority WAN CPU 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++
  • 42. Different types of loads ■ OLTP ● Small work items ● Latency sensitive ● involves narrow portion of the data ■ OLAP ● Large work items ● Throughput oriented ● Performed on large amounts of data
  • 43. Workload Prioritization Load #3 800 shares Load #2 400 shares Load #1 200 shares
  • 44. Scylla Design Decisions 1 2 All Things Async 3 Shard per Core 4 Unified Cache 5 I/O Scheduler 6 Autonomous C++
  • 45. More than 1M req/sec on i4i.8xlarge https://github.com/scylladb/1m-ops-demo by Attila Tóth
  • 46. 46 ■ Built for High Availability ■ Design to meet modern hardware ■ Use a fully async, share nothing, shard per core architecture ■ Superior throughput and consistent low latency ■ Expose internal scheduler to the user as Workload Prioritization Summary
  • 47. Scylla vs Competition ■ 1/7th the cost ■ 26x better in a real life scenario ■ 10x volume ■ 9.3x throughput ■ 1/4x latency ■ 4 Scylla nodes vs 40 Cassandra ■ 2.5X cheaper ■ 11x better latency ■ 1/5th cost in a benchmark ■ 20x better real-life scenario ■ No throttling ■ No locking CockroachDB Google’s Bigtable DynamoDB Cassandra logscaled
  • 48. Stay in Touch Tzach Livyatan tzach@scylladb.com https://twitter.com/TzachL https://github.com/tzach https://www.linkedin.com/in/tzach/