What Developers Need
to Unlearn for High
Performance NoSQL
Felipe Mendes
Tim Koopmans
+ For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
2
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
3
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
Presenters
Felipe Cardeneti Mendes
+ Puppy Lover
+ Open Source Enthusiast
+ ScyllaDB passionate!
Tim Koopmans
+ Tim Tam's Challenge Winner since 1998
+ Aussie with ideers
+ Driving on the RIGHT side
What Do We Mean by High
Performance NoSQL?
Drivers & App Side
Which Driver to Use?
Who are you?
I am ScyllaDB and I've got 6 shards!
Which Driver to Use?
Who are your peers?
What's the schema?
There you have it
system.local
system.peers
system_schema.tables
(...)
Which Driver to Use?
Control Connection system.local
system.peers
system_schema.tables
(...)
Shard Awareness
src_port % shard_count =
shard_to_connect
Connection Complete
Propagate Changes
Control Connection
Prepared Statements & Token Awareness
Control Connection system.local
system.peers
system_schema.tables
(...)
Shard Awareness
Prepared Query
Key, Key, Val
Which one to choose?
Hash(key) is owned by node X, shard Y
HA, Failover and Load Balancing
Load Balancing Policies
Tim's Region
Felipe's Region
Where to deploy? Once per region
To Index or Not?
Index == View == Another Table
Base replica
Paired View replica
Coordinator
Client App
Write Something
Store Changes
Update View
Which to Choose?
Index Cardinality: the number of unique indexes:
count(distinct(index_column))
FILTERING
MV
SI
Selectivity
Low selectivity queries:
● Return a large part of all rows (e.g. 70%)
● Great candidate for filtering
High selectivity queries:
● Return a small part of all rows (e.g. 1)
● Bad candidate for filtering
To Batch or Not?
Anti-Pattern
Client App
Far too much work for the coordinator
Christopher Batey's – Misuse of unlogged batches
Individual Inserts
Client App
Fully utilize cluster processing power
Christopher Batey's – Misuse of unlogged batches
Good Pattern
Client App
All to the same partition
Christopher Batey's – Misuse of unlogged batches
What about the Read Path?
For the same reason, be mindful of:
+ SELECT COUNT( * )
+ SELECT FROM x WHERE key IN ( ... )
+ ALLOW FILTERING
Whenever needed, divide and conquer
Parallel Efficient Full Table Scans with ScyllaDB – Just code please
Read Before Write?
Concurrent Writes
Designing Data Intensive Applications – Martin Kleppmann
Anti Pattern
Writes are sequential, thus reading before writing makes little sense
+ Introduces latency
+ Won't help with receiving "latest" data
+ May potentially lose data:
+ t0 A reads X, Y, Z
+ t1 B writes X, C, D
+ t1 A writes X, Y, D
+ t2 persists C or Y?
Last Write Wins
"Older" values are overwritten and discarded
+ Clients attach a wall clock timestamp to each request
+ Writes with a higher timestamp prevail over older ones
+ Upon a conflict:
+ Lexicographically higher value wins:
+ 10 > 1, 10 wins
+ Zebra > Ant, Zebra wins
Timestamp Conflict Resolution
Good Pattern
If at all possible:
+ Avoid concurrent updates to the same key or;
+ Assign an unique UUID or TimeUUID for the key or;
+ Accept some lost writes
If impossible:
+ Don't try to fix it yourself
+ Use Paxos via Lightweight Transactions
Timestamp Conflict Resolution Getting the Most out of Lightweight Transactions in ScyllaDB
Deletes are Writes
FAQ
+ I deleted data but disk space hasn't been released yet?
+ I deleted data and latency skyrocketed, why?
+ I am quite certain I deleted that data, how come it is back?
Storage/Node View
Incoming Delete
Summary
table
Bloom filters
Memtable
In memory
LSM Tree
Level 0
Level 1
Level 2
Flush
C
o
m
p
a
c
t
i
o
n
Can't evict tombstone
prior to shadowing data
Cluster View
Incoming Delete
Request
Response
Cluster View
Evicts Data Evicts Data
Outdated view
Later Read
Got data?
Nope
Cluster View
Outdated view
Later Read
Thou shalt have data
ack
Resurrection Resurrection
Response
Poll
How much data do you have under
management of your transactional
database?
Q&A
FREE NoSQL Database Training
Tuesday March 19, 2024
scylladb.com/events
Watch now on-demand at
scylladb.com/summit
Virtual Workshop
January 25, 2024
scylladb.com/events
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

What Developers Need to Unlearn for High Performance NoSQL

  • 1.
    What Developers Need toUnlearn for High Performance NoSQL Felipe Mendes Tim Koopmans
  • 2.
    + For data-intensiveapplications that require high throughput and predictable low latencies + Close-to-the-metal design takes full advantage of modern infrastructure + >5x higher throughput + >20x lower latency + >75% TCO savings + Compatible with Apache Cassandra and Amazon DynamoDB + DBaaS/Cloud, Enterprise and Open Source solutions The Database for Gamechangers 2 “ScyllaDB stands apart...It’s the rare product that exceeds my expectations.” – Martin Heller, InfoWorld contributing editor and reviewer “For 99.9% of applications, ScyllaDB delivers all the power a customer will ever need, on workloads that other databases can’t touch – and at a fraction of the cost of an in-memory solution.” – Adrian Bridgewater, Forbes senior contributor
  • 3.
    3 +400 Gamechangers LeverageScyllaDB Seamless experiences across content + devices Digital experiences at massive scale Corporate fleet management Real-time analytics 2,000,000 SKU -commerce management Video recommendation management Threat intelligence service using JanusGraph Real time fraud detection across 6M transactions/day Uber scale, mission critical chat & messaging app Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Unified ML feature store across the business Cryptocurrency exchange app Geography-based recommendations Global operations- Avon, Body Shop + more Predictable performance for on sale surges GPS-based exercise tracking Serving dynamic live streams at scale Powering India's top social media platform Personalized advertising to players Distribution of game assets in Unreal Engine
  • 4.
    Presenters Felipe Cardeneti Mendes +Puppy Lover + Open Source Enthusiast + ScyllaDB passionate! Tim Koopmans + Tim Tam's Challenge Winner since 1998 + Aussie with ideers + Driving on the RIGHT side
  • 5.
    What Do WeMean by High Performance NoSQL?
  • 6.
  • 7.
    Which Driver toUse? Who are you? I am ScyllaDB and I've got 6 shards!
  • 8.
    Which Driver toUse? Who are your peers? What's the schema? There you have it system.local system.peers system_schema.tables (...)
  • 9.
    Which Driver toUse? Control Connection system.local system.peers system_schema.tables (...) Shard Awareness src_port % shard_count = shard_to_connect Connection Complete Propagate Changes Control Connection
  • 10.
    Prepared Statements &Token Awareness Control Connection system.local system.peers system_schema.tables (...) Shard Awareness Prepared Query Key, Key, Val Which one to choose? Hash(key) is owned by node X, shard Y
  • 11.
    HA, Failover andLoad Balancing Load Balancing Policies Tim's Region Felipe's Region Where to deploy? Once per region
  • 12.
  • 13.
    Index == View== Another Table Base replica Paired View replica Coordinator Client App Write Something Store Changes Update View
  • 14.
    Which to Choose? IndexCardinality: the number of unique indexes: count(distinct(index_column)) FILTERING MV SI
  • 15.
    Selectivity Low selectivity queries: ●Return a large part of all rows (e.g. 70%) ● Great candidate for filtering High selectivity queries: ● Return a small part of all rows (e.g. 1) ● Bad candidate for filtering
  • 16.
  • 17.
    Anti-Pattern Client App Far toomuch work for the coordinator Christopher Batey's – Misuse of unlogged batches
  • 18.
    Individual Inserts Client App Fullyutilize cluster processing power Christopher Batey's – Misuse of unlogged batches
  • 19.
    Good Pattern Client App Allto the same partition Christopher Batey's – Misuse of unlogged batches
  • 20.
    What about theRead Path? For the same reason, be mindful of: + SELECT COUNT( * ) + SELECT FROM x WHERE key IN ( ... ) + ALLOW FILTERING Whenever needed, divide and conquer Parallel Efficient Full Table Scans with ScyllaDB – Just code please
  • 21.
  • 22.
    Concurrent Writes Designing DataIntensive Applications – Martin Kleppmann
  • 23.
    Anti Pattern Writes aresequential, thus reading before writing makes little sense + Introduces latency + Won't help with receiving "latest" data + May potentially lose data: + t0 A reads X, Y, Z + t1 B writes X, C, D + t1 A writes X, Y, D + t2 persists C or Y?
  • 24.
    Last Write Wins "Older"values are overwritten and discarded + Clients attach a wall clock timestamp to each request + Writes with a higher timestamp prevail over older ones + Upon a conflict: + Lexicographically higher value wins: + 10 > 1, 10 wins + Zebra > Ant, Zebra wins Timestamp Conflict Resolution
  • 25.
    Good Pattern If atall possible: + Avoid concurrent updates to the same key or; + Assign an unique UUID or TimeUUID for the key or; + Accept some lost writes If impossible: + Don't try to fix it yourself + Use Paxos via Lightweight Transactions Timestamp Conflict Resolution Getting the Most out of Lightweight Transactions in ScyllaDB
  • 26.
  • 27.
    FAQ + I deleteddata but disk space hasn't been released yet? + I deleted data and latency skyrocketed, why? + I am quite certain I deleted that data, how come it is back?
  • 28.
    Storage/Node View Incoming Delete Summary table Bloomfilters Memtable In memory LSM Tree Level 0 Level 1 Level 2 Flush C o m p a c t i o n Can't evict tombstone prior to shadowing data
  • 29.
  • 30.
    Cluster View Evicts DataEvicts Data Outdated view Later Read Got data? Nope
  • 31.
    Cluster View Outdated view LaterRead Thou shalt have data ack Resurrection Resurrection Response
  • 32.
    Poll How much datado you have under management of your transactional database?
  • 33.
    Q&A FREE NoSQL DatabaseTraining Tuesday March 19, 2024 scylladb.com/events Watch now on-demand at scylladb.com/summit Virtual Workshop January 25, 2024 scylladb.com/events
  • 34.
    Thank you for joiningus today. @scylladb scylladb/ slack.scylladb.com @scylladb company/scylladb/ scylladb/