Cassandra vs. ScyllaDB:
Evolutionary Differences
Peter Corless
+ Listen to customer stories
+ Write blogs & case studies
+ Play (and design) strategy &
roleplaying games
Director of Technical Advocacy
ScyllaDB
3
+ For data-intensive apps that require high
performance and low latency
+ Fully compatible with Apache Cassandra
and Amazon DynamoDB
+ 10X the performance & low tail latency
+ Open Source, Enterprise and Cloud options
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA, USA; Herzelia, Israel; Warsaw,
Poland; Teams around the world
About ScyllaDB
4
+ From “Chasing Cassandra” to “Beyond Cassandra”
+ Comparing Open Source Releases
+ Software Release Cycles
+ Public Perceptions
+ Features
+ Same-Same, Similarities, Differences
+ Benchmarks
Comparison: Cassandra vs. ScyllaDB
5
+ Wide-column NoSQL database (“key-key-value”)
+ Originally a re-architected reimplementation of Cassandra
+ Compatible with C*’s CQL, SSTables, drivers, etc.
+ Written in C++ (not Java) on the Seastar framework
+ Shard-per-core design
+ “Async everywhere”
+ Shared-nothing
+ Futures/promises
+ Now also offers a DynamoDB-compatible API, Alternator
What is ScyllaDB?
Learn more: https://www.scylladb.com/product/technology/
6
“Chasing Cassandra”
+ Scylla traditionally trailed implementation of Cassandra
+ Playing “catch-up” c. 2016 – 2020
+ Scylla 4.0 went beyond “feature completeness” for Cassandra 3.11
+ Now Scylla has features not found in Cassandra
+ Though Cassandra 4.0 has some features not (yet) present in Scylla
+ Some we’ll add for parity/compatibility
+ Some we’ll go our own way (solve differently, improve or obviate)
7
Scylla Beyond Cassandra
Cassandra
“core”
Scylla same-
same
“core” iteration
Unique to
Cassandra
Unique to
Scylla
Scylla specific
implementation
Cassandra
specific
implementation
Same/similar feature implemented differently
May or may not be intercompatible
Same/similar feature implemented
identically/intercompatible
Not in Cassandra
Not in Scylla
8
Comparing Open Source
Releases
Apache Cassandra vs. ScyllaDB
9
+ February 2017: Cassandra 3.11 released
+ May 2019: First Roadmap for Cassandra 4.0 laid out
+ September 2019: First 4.0 Alpha
+ July 2020: First 4.0 Beta
+ April 26, 2021: Cassandra 4.0rc1
+ April 28, 2021: Cassandra 4.0 “World Party”
+ July 27, 2021: Actual Cassandra 4.0 Release Date
+ Sep, 07, 2021: Cassandra 4.0.1
+ >4 Years from last minor release (3.11) to 4.0
Cassandra 4.0 is Finally Here!
10
Scylla’s Predictable Releases
Aug 2021: Apache Cassandra 4.0 vs. Scylla 4.4: Comparing Performance
11
Scylla Open Source + Enterprise
12
Head-to-Head
+ Scylla engineers make
~5x more commits/month
+ Bigger engineering team —
~50% more active committers
+ More active release cycle —
13x more major/minor releases
over past 3 years
+ More popular with developers —
Scylla exceeds Cassandra
in Github stars
13
Comparing Public
Perception
Apache Cassandra vs. ScyllaDB
14
Scylla Moving Up in the [DB-Engines.com] Rankings
ScyllaDB was 4th fastest rising database in
the DB-Engines.com Top 100 from Jan
2021 to Jan 2022 [Rank #85, Score 3.91]
Source: https://db-engines.com/en/ranking
Cassandra remains #11 overall in the
DB-Engines.com ranks for Jan 2022
[Score 123.55]
15
Cassandra 4.0’s effect on DB-Engines.com
Cassandra 4.0 definitely boosted flagging popularity
16
What’s the Same Between
Scylla & Cassandra?
Commonalities
17
Common Ancestry
+ Cassandra and Scylla both
descend from the same historical
antecedents / whitepapers
+ Google’s Bigtable
+ Amazon’s Dynamo
+ Facebook’s Cassandra
+ [Not to be confused with
commercial offerings Google
Cloud Bigtable and Amazon
DynamoDB, or open source
Apache Cassandra]
18
+ Peer-to-peer leaderless topology
+ Replication Factor (RF)
+ Tunable consistency per request
+ Multi datacenter replication
+ CAP Theorem: Availability/Partition
Tolerant “AP”
High Availability ᐩ No primary/replica complications
ᐩ Homogeneity of nodes
ᐩ Full datacenter loss can be survivable
19
Ring Architecture
+ Token ring topology
+ Wide column “Key-key value”
+ Partition key
+ Clustering key
+ Nodes/vNodes
+ Automatic sharding
+ Same murmur3 partitioner &
hash algorithms
20
Keyspaces, Tables
+ CREATE KEYSPACE
+ CREATE TABLE
+ ALTER KEYSPACE
+ ALTER TABLE
+ DROP KEYSPACE
+ DROP TABLE
ᐩ Pretty much standard Cassandra
Query Language (CQL)
21
Basic CQL CRUD
Operations
+ Create [INSERT]
+ Read [SELECT]
+ Update [UPDATE]
+ Delete [DELETE]
+ WHERE clause
+ ALLOW FILTERING
+ TTL functions
ᐩ Pretty much standard Cassandra
Query Language (CQL)
ᐩ Like SQL, at least at cursory
glance, but do not be lulled into a
false sense of familiarity
22
CQL Drivers
+ ScyllaDB can use all the
same Apache Cassandra and
DataStax drivers
+ Allows for a replacement of
ScyllaDB on the back end
without touching any existing
client apps or drivers
ᐩ They will not take advantage of
ScyllaDB’s shard-aware
architecture, but they’ll work
23
+ <1 terabyte
+ 1 to 50 terabytes
+ 50-100 terabytes
+ >100 terabytes
How much data do you have under management in your own
transactional database systems?
Poll Question
24
What’s similar but not
the same?
Cassandra and Scylla differences
25
CQL
+ For the most part, all basic
CQL queries for Cassandra
will work with Scylla
+ Scylla uses the same CQL
wire protocol as Cassandra
ᐩ Scylla does implement some
features differently (we’ll get into
those)
ᐩ Naturally, those differences will
have related CQL commands
ᐩ Implementation lag:
Scylla is compatible to CQL 3.4.0;
current Cassandra CQL is 3.4.5
26
SSTables
+ Scylla supports the same
immutable on-disk SSTable
LSM tree file formats
+ Standard compaction
algorithms are the same
(LCS, STCS, TWCS)
ᐩ Cassandra 4.0 implemented a new
“nb” SSTable file format
ᐩ Scylla will add support for “nb” file
format #8593
// na (4.0-rc1): uncompressed chunks,
pending repair session, isTransient,
checksummed sstable metadata file, new
Bloomfilter format
// nb (4.0.0): originating host id
ᐩ Scylla will also add support for “me”
file format #9869
27
Lightweight
Transactions (LWT)
+ Both use Paxos consensus
algorithm
+ Compare-and-set operations
+ Also called “conditional updates”
ᐩ Scylla can accomplish LWTs in only
3 round trips (Cassandra takes 4)
ᐩ Scylla is more performant / efficient
ᐩ Blog:
https://www.scylladb.com/2020/07/15/
getting-the-most-out-of-lightweight-
transactions-in-scylla/
Scylla accomplishes LWTs in 3x round trips
Cassandra LWTs take 4x round trips
28
Materialized Views
+ Cassandra: introduced in 3.0
[2017], but still experimental
+ Problems when base table gets
out of sync
+ To this day, major issues like
CASSANDRA-10346 are still open
ᐩ Scylla: production ready since 3.0 [Jan
2019]
ᐩ Serve as the infrastructural basis for
Secondary Indexes
ᐩ Can still get out of sync, but not easily
ᐩ Continually improving implementation
* Read more:
https://www.scylladb.com/2018/09/19/overheard-at-
distributed-data-summit/
“If you have them, take them out.”
— Nate McCall PMC Chair,
on Materialized Views in Cassandra [2018]*
29
Secondary Indexes
+ Cassandra: only local Secondary
Indexes (SIs)
+ Scylla: both local and global SIs
+ The choice is now yours!
ᐩ https://www.scylladb.com/2019/07/23/
global-or-localsecondary-indexes-in-
scylla-the-choice-is-now-yours/
A global indexing query workflow in Scylla
30
+ Introduced in C* 3.8, uses commitlog-like structure
+ Creates indexes as commit logs are written - for
improved performance and reliability
+ Feature enabled through cassandra.yaml
+ CDC can be enabled per table through ALTER TABLE
command
+ Currently, no standard way to read CDC files
+ DS planning to open source Kafka Source
connector
+ Advance replication from DS Labs
+ Example CDC project build by someone
Change Data
Capture (CDC)
CDC in Scylla
ᐩ Implemented as standard CQL Tables
ᐩ Just like adding another table
ᐩ Enabled by default
ᐩ Easy to integrate & consume
+ Deltas (changes) plus pre/post image
+ Replicated in same way as normal data
ᐩ Reasonable overhead
ᐩ TTL prevents unbounded data
ᐩ Easily consumable by Apache Kafka
31
+ Debezium-based
+ Simply consumes CDC data via CQL
+ Doesn’t need to de-dupe data
+ Pumps data into Kafka topics
+ Confluent-certified
+ Less muss & fuss
Kafka CDC Source
Connector
32
Zero Copy Streaming
vs. Row-level Repair
+ Cassandra now can stream
SSTables as a whole
+ Bypasses turning SStables into
objects (aka “object reification”)
providing 5x better performance
ᐩ Scylla implemented a completely
different approach in 2019
ᐩ Scylla’s row-level repair feature is
used instead of streaming
ᐩ Row-level repair is more:
○ Robust: Better able to endure
interruptions and outages
○ Granular: Only specific rows are
transferred
○ Efficient: There’s no extra data
streaming!
33
+ C* 4.0 integrates async-driven code from
Netty library for communication between
nodes to leverage Java’s Non-Blocking IO
(NIO) capability.
+ A single thread pool for all connections to
corresponding nodes instead of
maintaining N threads per peer.
+ Potentially improves internode
performance issues, providing better tail
latencies and facilitating zero-copy
streaming.
Netty Async
Messaging
ᐩ Scylla also believes in non-blocking IO
ᐩ Scylla uses asynchronous / non blocking I/O
in C++ (aio) with its own schedulers
ᐩ Scylla per-core shards maintain as great a
shared-nothing approach as possible; use
async messaging when needed
ᐩ Read:
https://www.scylladb.com/2021/09/15/what-
weve-learned-after-6-years-of-io-scheduling/
34
+ Plethora of K8s operators
+ DataStax K8ssandra 1.3+
+ Orange KassCop 2.0+
+ Bitnami Charts
+ [cass-operator deprecated]
+ Sidecars collocated/run on the same
instance as the DB server daemon
+ What Works and What Doesn’t:
https://k8ssandra.io/blog/articles/ku
bernetes-and-apache-cassandra-
what-works-and-what-doesnt/
Kubernetes Support
& Sidecars
ᐩ Scylla Operator offers great K8s
support — It just works
ᐩ Scylla Manager Agent is a sidecar
and already included by default with
Scylla Operator
ᐩ https://www.scylladb.com/product/
scylla-operator-kubernetes/
35
What’s Just Totally
Different?
Cassandra and Scylla differences
36
Shard-per-Core
Architecture
+ Based Seastar framework
(also used in Redpanda,
Redhat Crimson)
+ Designed/optimized for
multicore systems (scales to
100+ CPUs per node)
ᐩ Cassandra is shard-per-node
ᐩ Scylla balances data with more
granularity
37
Shard-Aware Drivers ᐩ Our shard-aware Rust driver serves as
the paradigm for our new shard-aware
drivers
ᐩ Still backwards-compatible with
Cassandra
ᐩ Get it on Github!
https://github.com/scylladb/scylla-rust-driver
+ Better performance than a
“vanilla” CQL driver
+ “Smart” token-aware clients
direct queries to specific shards
(cores) where data resides
+ Better for consumption of CDC
data tables
+ Up to 25% greater performance
38
+ Gossip in Cassandra requires seed
nodes; which violates the idea of
homogeneity of nodes
+ Requires manual assignment and
configuration
+ Seed nodes do not bootstrap
+ Complicated to add new seed node
or replace a dead seed node
Seedless Gossip ᐩ Scylla implemented gossip without
requiring seed nodes
ᐩ More symmetric; less problematic
ᐩ Read more:
https://www.scylladb.com/2020/09/22/s
eedless-nosql-getting-rid-of-seed-nodes-
in-scylla/
39
+ Run your DynamoDB-compatible
workloads anywhere:
+ on AWS or in an AWS Outpost
+ on Google Cloud, Azure, or
+ on-premises
+ Supports DynamoDB Streams
+ Supports Load Balancing
+ Scylla Spark Migrator to move data
to any Scylla cluster anywhere
DynamoDB-compatible
API (Alternator)
ᐩ Cassandra has no comparable feature
40
+ Schema Changes
+ Topology Changes
+ Add or remove any number of
nodes simultaneously
+ Durable and linearizable
+ Background Data Rebalancing
+ Tablets!
+ Immediate, Strong Consistency
of MVs, SIs, CDC tables
+ 1 Round Trip!
Raft in ScyllaDB
Not in Cassandra
41
Benchmarking:
Cassandra 4.0 vs Scylla 4.4
and how Scylla dominates
42
Cassandra 4.0 vs. Scylla 4.4
+ Scylla up to 100x lower P99 latencies
+ Scylla can maintain 2x - 5x throughput
+ Scylla adds nodes 3x faster
43
Scylla 4.4 vs. Cassandra 4.0
+ Cassandra 4.0 cannot
maintain useable
low latencies except
at very low throughput
(≤30-40k ops)
+ Scylla can maintain
low latencies for far
greater throughputs
(≤170-180k ops)
44
Replacing a Node
+ Scylla can heal clusters far
faster than Cassandra 4.0 by
spinning nodes up and
rebalancing data
~3x - 4x faster
45
Doubling Cluster Capacity
+ Scylla doubled a cluster’s
capacity in just over
an hour and a half
(94 minutes)
+ It took Cassandra 4.0
just shy of 4 hours
(238 minutes)
to perform the same task
+ Scylla performed 2.5X faster
46
+ Scylla 4.4: 36 min on a 3-node cluster
+ Cassandra 4.0 took 36x - 63x as long
(nearly a day; or a day and a half!)
+ Cassandra 4.0 performed worse than
Cassandra 3.11 with num_tokens: 16
Major Compaction Speed
47
TCO Comparison: 4 vs. 40
+ 4x i3.metal instances with Scylla
provided the same or better performance
as 40 nodes of Cassandra on i3.4xlarge
+ Cassandra had 640 vCPUs
+ Scylla had 288 vCPUs
+ Scylla got better utility out of hardware
+ Cost savings of 60%
+ Administrative burden/attack surface
reduced by 90%
48
BLOGS
+ Benchmark, Part 1: Cassandra 4.0 vs. Cassandra 3.11: Comparing Performance
+ Benchmark, Part 2: Apache Cassandra 4.0 vs. Scylla 4.4: Comparing Performance
+ Webinar: Your Questions about Cassandra 4.0 vs. Scylla 4.4 Answered
WEBINAR
+ Comparing Apache Cassandra 4.0, 3.0 and ScyllaDB
Published Benchmarks
United States
2445 Faber St, Suite #200
Palo Alto, CA USA 94303
Israel
Maskit 4
Herzliya, Israel 4673304
www.scylladb.com
@scylladb
Learn NoSQL for free!
university.scylladb.com
@petercorless
Questions?

Cassandra vs. ScyllaDB: Evolutionary Differences

  • 1.
  • 2.
    Peter Corless + Listento customer stories + Write blogs & case studies + Play (and design) strategy & roleplaying games Director of Technical Advocacy ScyllaDB
  • 3.
    3 + For data-intensiveapps that require high performance and low latency + Fully compatible with Apache Cassandra and Amazon DynamoDB + 10X the performance & low tail latency + Open Source, Enterprise and Cloud options + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA, USA; Herzelia, Israel; Warsaw, Poland; Teams around the world About ScyllaDB
  • 4.
    4 + From “ChasingCassandra” to “Beyond Cassandra” + Comparing Open Source Releases + Software Release Cycles + Public Perceptions + Features + Same-Same, Similarities, Differences + Benchmarks Comparison: Cassandra vs. ScyllaDB
  • 5.
    5 + Wide-column NoSQLdatabase (“key-key-value”) + Originally a re-architected reimplementation of Cassandra + Compatible with C*’s CQL, SSTables, drivers, etc. + Written in C++ (not Java) on the Seastar framework + Shard-per-core design + “Async everywhere” + Shared-nothing + Futures/promises + Now also offers a DynamoDB-compatible API, Alternator What is ScyllaDB? Learn more: https://www.scylladb.com/product/technology/
  • 6.
    6 “Chasing Cassandra” + Scyllatraditionally trailed implementation of Cassandra + Playing “catch-up” c. 2016 – 2020 + Scylla 4.0 went beyond “feature completeness” for Cassandra 3.11 + Now Scylla has features not found in Cassandra + Though Cassandra 4.0 has some features not (yet) present in Scylla + Some we’ll add for parity/compatibility + Some we’ll go our own way (solve differently, improve or obviate)
  • 7.
    7 Scylla Beyond Cassandra Cassandra “core” Scyllasame- same “core” iteration Unique to Cassandra Unique to Scylla Scylla specific implementation Cassandra specific implementation Same/similar feature implemented differently May or may not be intercompatible Same/similar feature implemented identically/intercompatible Not in Cassandra Not in Scylla
  • 8.
  • 9.
    9 + February 2017:Cassandra 3.11 released + May 2019: First Roadmap for Cassandra 4.0 laid out + September 2019: First 4.0 Alpha + July 2020: First 4.0 Beta + April 26, 2021: Cassandra 4.0rc1 + April 28, 2021: Cassandra 4.0 “World Party” + July 27, 2021: Actual Cassandra 4.0 Release Date + Sep, 07, 2021: Cassandra 4.0.1 + >4 Years from last minor release (3.11) to 4.0 Cassandra 4.0 is Finally Here!
  • 10.
    10 Scylla’s Predictable Releases Aug2021: Apache Cassandra 4.0 vs. Scylla 4.4: Comparing Performance
  • 11.
  • 12.
    12 Head-to-Head + Scylla engineersmake ~5x more commits/month + Bigger engineering team — ~50% more active committers + More active release cycle — 13x more major/minor releases over past 3 years + More popular with developers — Scylla exceeds Cassandra in Github stars
  • 13.
  • 14.
    14 Scylla Moving Upin the [DB-Engines.com] Rankings ScyllaDB was 4th fastest rising database in the DB-Engines.com Top 100 from Jan 2021 to Jan 2022 [Rank #85, Score 3.91] Source: https://db-engines.com/en/ranking Cassandra remains #11 overall in the DB-Engines.com ranks for Jan 2022 [Score 123.55]
  • 15.
    15 Cassandra 4.0’s effecton DB-Engines.com Cassandra 4.0 definitely boosted flagging popularity
  • 16.
    16 What’s the SameBetween Scylla & Cassandra? Commonalities
  • 17.
    17 Common Ancestry + Cassandraand Scylla both descend from the same historical antecedents / whitepapers + Google’s Bigtable + Amazon’s Dynamo + Facebook’s Cassandra + [Not to be confused with commercial offerings Google Cloud Bigtable and Amazon DynamoDB, or open source Apache Cassandra]
  • 18.
    18 + Peer-to-peer leaderlesstopology + Replication Factor (RF) + Tunable consistency per request + Multi datacenter replication + CAP Theorem: Availability/Partition Tolerant “AP” High Availability ᐩ No primary/replica complications ᐩ Homogeneity of nodes ᐩ Full datacenter loss can be survivable
  • 19.
    19 Ring Architecture + Tokenring topology + Wide column “Key-key value” + Partition key + Clustering key + Nodes/vNodes + Automatic sharding + Same murmur3 partitioner & hash algorithms
  • 20.
    20 Keyspaces, Tables + CREATEKEYSPACE + CREATE TABLE + ALTER KEYSPACE + ALTER TABLE + DROP KEYSPACE + DROP TABLE ᐩ Pretty much standard Cassandra Query Language (CQL)
  • 21.
    21 Basic CQL CRUD Operations +Create [INSERT] + Read [SELECT] + Update [UPDATE] + Delete [DELETE] + WHERE clause + ALLOW FILTERING + TTL functions ᐩ Pretty much standard Cassandra Query Language (CQL) ᐩ Like SQL, at least at cursory glance, but do not be lulled into a false sense of familiarity
  • 22.
    22 CQL Drivers + ScyllaDBcan use all the same Apache Cassandra and DataStax drivers + Allows for a replacement of ScyllaDB on the back end without touching any existing client apps or drivers ᐩ They will not take advantage of ScyllaDB’s shard-aware architecture, but they’ll work
  • 23.
    23 + <1 terabyte +1 to 50 terabytes + 50-100 terabytes + >100 terabytes How much data do you have under management in your own transactional database systems? Poll Question
  • 24.
    24 What’s similar butnot the same? Cassandra and Scylla differences
  • 25.
    25 CQL + For themost part, all basic CQL queries for Cassandra will work with Scylla + Scylla uses the same CQL wire protocol as Cassandra ᐩ Scylla does implement some features differently (we’ll get into those) ᐩ Naturally, those differences will have related CQL commands ᐩ Implementation lag: Scylla is compatible to CQL 3.4.0; current Cassandra CQL is 3.4.5
  • 26.
    26 SSTables + Scylla supportsthe same immutable on-disk SSTable LSM tree file formats + Standard compaction algorithms are the same (LCS, STCS, TWCS) ᐩ Cassandra 4.0 implemented a new “nb” SSTable file format ᐩ Scylla will add support for “nb” file format #8593 // na (4.0-rc1): uncompressed chunks, pending repair session, isTransient, checksummed sstable metadata file, new Bloomfilter format // nb (4.0.0): originating host id ᐩ Scylla will also add support for “me” file format #9869
  • 27.
    27 Lightweight Transactions (LWT) + Bothuse Paxos consensus algorithm + Compare-and-set operations + Also called “conditional updates” ᐩ Scylla can accomplish LWTs in only 3 round trips (Cassandra takes 4) ᐩ Scylla is more performant / efficient ᐩ Blog: https://www.scylladb.com/2020/07/15/ getting-the-most-out-of-lightweight- transactions-in-scylla/ Scylla accomplishes LWTs in 3x round trips Cassandra LWTs take 4x round trips
  • 28.
    28 Materialized Views + Cassandra:introduced in 3.0 [2017], but still experimental + Problems when base table gets out of sync + To this day, major issues like CASSANDRA-10346 are still open ᐩ Scylla: production ready since 3.0 [Jan 2019] ᐩ Serve as the infrastructural basis for Secondary Indexes ᐩ Can still get out of sync, but not easily ᐩ Continually improving implementation * Read more: https://www.scylladb.com/2018/09/19/overheard-at- distributed-data-summit/ “If you have them, take them out.” — Nate McCall PMC Chair, on Materialized Views in Cassandra [2018]*
  • 29.
    29 Secondary Indexes + Cassandra:only local Secondary Indexes (SIs) + Scylla: both local and global SIs + The choice is now yours! ᐩ https://www.scylladb.com/2019/07/23/ global-or-localsecondary-indexes-in- scylla-the-choice-is-now-yours/ A global indexing query workflow in Scylla
  • 30.
    30 + Introduced inC* 3.8, uses commitlog-like structure + Creates indexes as commit logs are written - for improved performance and reliability + Feature enabled through cassandra.yaml + CDC can be enabled per table through ALTER TABLE command + Currently, no standard way to read CDC files + DS planning to open source Kafka Source connector + Advance replication from DS Labs + Example CDC project build by someone Change Data Capture (CDC) CDC in Scylla ᐩ Implemented as standard CQL Tables ᐩ Just like adding another table ᐩ Enabled by default ᐩ Easy to integrate & consume + Deltas (changes) plus pre/post image + Replicated in same way as normal data ᐩ Reasonable overhead ᐩ TTL prevents unbounded data ᐩ Easily consumable by Apache Kafka
  • 31.
    31 + Debezium-based + Simplyconsumes CDC data via CQL + Doesn’t need to de-dupe data + Pumps data into Kafka topics + Confluent-certified + Less muss & fuss Kafka CDC Source Connector
  • 32.
    32 Zero Copy Streaming vs.Row-level Repair + Cassandra now can stream SSTables as a whole + Bypasses turning SStables into objects (aka “object reification”) providing 5x better performance ᐩ Scylla implemented a completely different approach in 2019 ᐩ Scylla’s row-level repair feature is used instead of streaming ᐩ Row-level repair is more: ○ Robust: Better able to endure interruptions and outages ○ Granular: Only specific rows are transferred ○ Efficient: There’s no extra data streaming!
  • 33.
    33 + C* 4.0integrates async-driven code from Netty library for communication between nodes to leverage Java’s Non-Blocking IO (NIO) capability. + A single thread pool for all connections to corresponding nodes instead of maintaining N threads per peer. + Potentially improves internode performance issues, providing better tail latencies and facilitating zero-copy streaming. Netty Async Messaging ᐩ Scylla also believes in non-blocking IO ᐩ Scylla uses asynchronous / non blocking I/O in C++ (aio) with its own schedulers ᐩ Scylla per-core shards maintain as great a shared-nothing approach as possible; use async messaging when needed ᐩ Read: https://www.scylladb.com/2021/09/15/what- weve-learned-after-6-years-of-io-scheduling/
  • 34.
    34 + Plethora ofK8s operators + DataStax K8ssandra 1.3+ + Orange KassCop 2.0+ + Bitnami Charts + [cass-operator deprecated] + Sidecars collocated/run on the same instance as the DB server daemon + What Works and What Doesn’t: https://k8ssandra.io/blog/articles/ku bernetes-and-apache-cassandra- what-works-and-what-doesnt/ Kubernetes Support & Sidecars ᐩ Scylla Operator offers great K8s support — It just works ᐩ Scylla Manager Agent is a sidecar and already included by default with Scylla Operator ᐩ https://www.scylladb.com/product/ scylla-operator-kubernetes/
  • 35.
  • 36.
    36 Shard-per-Core Architecture + Based Seastarframework (also used in Redpanda, Redhat Crimson) + Designed/optimized for multicore systems (scales to 100+ CPUs per node) ᐩ Cassandra is shard-per-node ᐩ Scylla balances data with more granularity
  • 37.
    37 Shard-Aware Drivers ᐩOur shard-aware Rust driver serves as the paradigm for our new shard-aware drivers ᐩ Still backwards-compatible with Cassandra ᐩ Get it on Github! https://github.com/scylladb/scylla-rust-driver + Better performance than a “vanilla” CQL driver + “Smart” token-aware clients direct queries to specific shards (cores) where data resides + Better for consumption of CDC data tables + Up to 25% greater performance
  • 38.
    38 + Gossip inCassandra requires seed nodes; which violates the idea of homogeneity of nodes + Requires manual assignment and configuration + Seed nodes do not bootstrap + Complicated to add new seed node or replace a dead seed node Seedless Gossip ᐩ Scylla implemented gossip without requiring seed nodes ᐩ More symmetric; less problematic ᐩ Read more: https://www.scylladb.com/2020/09/22/s eedless-nosql-getting-rid-of-seed-nodes- in-scylla/
  • 39.
    39 + Run yourDynamoDB-compatible workloads anywhere: + on AWS or in an AWS Outpost + on Google Cloud, Azure, or + on-premises + Supports DynamoDB Streams + Supports Load Balancing + Scylla Spark Migrator to move data to any Scylla cluster anywhere DynamoDB-compatible API (Alternator) ᐩ Cassandra has no comparable feature
  • 40.
    40 + Schema Changes +Topology Changes + Add or remove any number of nodes simultaneously + Durable and linearizable + Background Data Rebalancing + Tablets! + Immediate, Strong Consistency of MVs, SIs, CDC tables + 1 Round Trip! Raft in ScyllaDB Not in Cassandra
  • 41.
    41 Benchmarking: Cassandra 4.0 vsScylla 4.4 and how Scylla dominates
  • 42.
    42 Cassandra 4.0 vs.Scylla 4.4 + Scylla up to 100x lower P99 latencies + Scylla can maintain 2x - 5x throughput + Scylla adds nodes 3x faster
  • 43.
    43 Scylla 4.4 vs.Cassandra 4.0 + Cassandra 4.0 cannot maintain useable low latencies except at very low throughput (≤30-40k ops) + Scylla can maintain low latencies for far greater throughputs (≤170-180k ops)
  • 44.
    44 Replacing a Node +Scylla can heal clusters far faster than Cassandra 4.0 by spinning nodes up and rebalancing data ~3x - 4x faster
  • 45.
    45 Doubling Cluster Capacity +Scylla doubled a cluster’s capacity in just over an hour and a half (94 minutes) + It took Cassandra 4.0 just shy of 4 hours (238 minutes) to perform the same task + Scylla performed 2.5X faster
  • 46.
    46 + Scylla 4.4:36 min on a 3-node cluster + Cassandra 4.0 took 36x - 63x as long (nearly a day; or a day and a half!) + Cassandra 4.0 performed worse than Cassandra 3.11 with num_tokens: 16 Major Compaction Speed
  • 47.
    47 TCO Comparison: 4vs. 40 + 4x i3.metal instances with Scylla provided the same or better performance as 40 nodes of Cassandra on i3.4xlarge + Cassandra had 640 vCPUs + Scylla had 288 vCPUs + Scylla got better utility out of hardware + Cost savings of 60% + Administrative burden/attack surface reduced by 90%
  • 48.
    48 BLOGS + Benchmark, Part1: Cassandra 4.0 vs. Cassandra 3.11: Comparing Performance + Benchmark, Part 2: Apache Cassandra 4.0 vs. Scylla 4.4: Comparing Performance + Webinar: Your Questions about Cassandra 4.0 vs. Scylla 4.4 Answered WEBINAR + Comparing Apache Cassandra 4.0, 3.0 and ScyllaDB Published Benchmarks
  • 49.
    United States 2445 FaberSt, Suite #200 Palo Alto, CA USA 94303 Israel Maskit 4 Herzliya, Israel 4673304 www.scylladb.com @scylladb Learn NoSQL for free! university.scylladb.com @petercorless Questions?