Developer Data Modeling Mistakes: From Postgres to NoSQL

Developer Data
Modeling Mistakes:
From Postgres to NoSQL
Felipe Mendes
Tim Koopmans

+ For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
2
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor

3
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate ﬂeet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Uniﬁed ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine

Presenters
Felipe Cardeneti Mendes
+ Puppy Lover
+ Open Source Enthusiast
+ ScyllaDB passionate!
Tim Koopmans
+ Tim Tam's Challenge Winner since 1998
+ Aussie with ideers
+ Driving on the RIGHT side

Rust Speed Racing
Pick Up Where We Left Off

Which Driver to Use?
Who are you?
I am ScyllaDB and I've got 6 shards!

Who are your peers?
What's the schema?
There you have it
system.local
system.peers
system_schema.tables
(...)

Control Connection system.local
system.peers
(...)
Shard Awareness
src_port % shard_count =
shard_to_connect
Connection Complete
Propagate Changes
Control Connection

Prepared Statements & Token Awareness
Control Connection system.local
system.peers
(...)
Shard Awareness
Prepared Query
Key, Key, Val
Which one to choose?
Hash(key) is owned by node X, shard Y

HA, Failover and Load Balancing
Load Balancing Policies
Tim's Region
Felipe's Region
Where to deploy? Once per region

Index == View == Another Table
Base replica
Paired View replica
Coordinator
Client App
Write Something
Store Changes
Update View

Which to Choose?
Index Cardinality: the number of unique indexes:
count(distinct(index_column))
FILTERING
MV
SI

Selectivity
Low selectivity queries:
● Return a large part of all rows (e.g. 70%)
● Great candidate for ﬁltering
High selectivity queries:
● Return a small part of all rows (e.g. 1)
● Bad candidate for ﬁltering

Anti-Pattern
Client App
Far too much work for the coordinator
Christopher Batey's – Misuse of unlogged batches

Individual Inserts
Client App
Fully utilize cluster processing power

Good Pattern
Client App
All to the same partition

What about the Read Path?
For the same reason, be mindful of:
+ SELECT COUNT( * )
+ SELECT FROM x WHERE key IN ( ... )
+ ALLOW FILTERING
Whenever needed, divide and conquer
Parallel Eﬃcient Full Table Scans with ScyllaDB – Just code please

Concurrent Writes
Designing Data Intensive Applications – Martin Kleppmann

Anti Pattern
Writes are sequential, thus reading before writing makes little sense
+ Introduces latency
+ Won't help with receiving "latest" data
+ May potentially lose data:
+ t0 A reads X, Y, Z
+ t1 B writes X, C, D
+ t1 A writes X, Y, D
+ t2 persists C or Y?

Last Write Wins
"Older" values are overwritten and discarded
+ Clients attach a wall clock timestamp to each request
+ Writes with a higher timestamp prevail over older ones
+ Upon a conﬂict:
+ Lexicographically higher value wins:
+ 10 > 1, 10 wins
+ Zebra > Ant, Zebra wins
Timestamp Conﬂict Resolution

Good Pattern
If at all possible:
+ Avoid concurrent updates to the same key or;
+ Assign an unique UUID or TimeUUID for the key or;
+ Accept some lost writes
If impossible:
+ Don't try to ﬁx it yourself
+ Use Paxos via Lightweight Transactions
Timestamp Conﬂict Resolution Getting the Most out of Lightweight Transactions in ScyllaDB

FAQ
+ I deleted data but disk space hasn't been released yet?
+ I deleted data and latency skyrocketed, why?
+ I am quite certain I deleted that data, how come it is back?

Storage/Node View
Incoming Delete
Summary
table
Bloom ﬁlters
Memtable
In memory
LSM Tree
Level 0
Level 1
Level 2
Flush
C
o
m
p
a
c
t
i
o
n
Can't evict tombstone
prior to shadowing data

Cluster View
Incoming Delete
Request
Response

Cluster View
Evicts Data Evicts Data
Outdated view
Later Read
Got data?
Nope

Cluster View
Outdated view
Later Read
Thou shalt have data
ack
Resurrection Resurrection
Response

Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

Developer Data Modeling Mistakes: From Postgres to NoSQL

Recommended

Recommended

More Related Content

Similar to Developer Data Modeling Mistakes: From Postgres to NoSQL

Similar to Developer Data Modeling Mistakes: From Postgres to NoSQL (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL