See where an RDBMS-pro’s intuition leads him astray – and learn practical tips for the data modeling transition
ScyllaDB has the potential to deliver impressive performance and scalability. The better you understand how it works, the more you can squeeze out of it. However, developers new to high-performance NoSQL intuitively shoot themselves in the foot with respect to things like table design, query design, indexing, and partitioning.
Watch where our experienced Postgres developer intuitively falls into traps that hurt performance and scalability. And learn with him as our database performance expert offers friendly guidance on navigating all the unexpected behaviors that tend to trip up RDBMS experts.
This webinar focuses on common data modeling and querying mistakes that occur when developers move from SQL to NoSQL. For example:
- Understanding query first design principles
- Planning for schema evolution
- Steering clear of common pitfalls and anti-patterns
- Assessing data access patterns
This isn’t “Death-by-Powerpoint.” We’ll walk through problems encountered while migrating a real application from Postgres to ScyllaDB – and try to fix them live as well.
2. + For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
2
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
3. 3
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
4. Presenters
Felipe Cardeneti Mendes
+ Puppy Lover
+ Open Source Enthusiast
+ ScyllaDB passionate!
Tim Koopmans
+ Tim Tam's Challenge Winner since 1998
+ Aussie with ideers
+ Driving on the RIGHT side
7. Which Driver to Use?
Who are you?
I am ScyllaDB and I've got 6 shards!
8. Which Driver to Use?
Who are your peers?
What's the schema?
There you have it
system.local
system.peers
system_schema.tables
(...)
9. Which Driver to Use?
Control Connection system.local
system.peers
system_schema.tables
(...)
Shard Awareness
src_port % shard_count =
shard_to_connect
Connection Complete
Propagate Changes
Control Connection
10. Prepared Statements & Token Awareness
Control Connection system.local
system.peers
system_schema.tables
(...)
Shard Awareness
Prepared Query
Key, Key, Val
Which one to choose?
Hash(key) is owned by node X, shard Y
11. HA, Failover and Load Balancing
Load Balancing Policies
Tim's Region
Felipe's Region
Where to deploy? Once per region
13. Index == View == Another Table
Base replica
Paired View replica
Coordinator
Client App
Write Something
Store Changes
Update View
14. Which to Choose?
Index Cardinality: the number of unique indexes:
count(distinct(index_column))
FILTERING
MV
SI
15. Selectivity
Low selectivity queries:
● Return a large part of all rows (e.g. 70%)
● Great candidate for filtering
High selectivity queries:
● Return a small part of all rows (e.g. 1)
● Bad candidate for filtering
20. What about the Read Path?
For the same reason, be mindful of:
+ SELECT COUNT( * )
+ SELECT FROM x WHERE key IN ( ... )
+ ALLOW FILTERING
Whenever needed, divide and conquer
Parallel Efficient Full Table Scans with ScyllaDB – Just code please
23. Anti Pattern
Writes are sequential, thus reading before writing makes little sense
+ Introduces latency
+ Won't help with receiving "latest" data
+ May potentially lose data:
+ t0 A reads X, Y, Z
+ t1 B writes X, C, D
+ t1 A writes X, Y, D
+ t2 persists C or Y?
24. Last Write Wins
"Older" values are overwritten and discarded
+ Clients attach a wall clock timestamp to each request
+ Writes with a higher timestamp prevail over older ones
+ Upon a conflict:
+ Lexicographically higher value wins:
+ 10 > 1, 10 wins
+ Zebra > Ant, Zebra wins
Timestamp Conflict Resolution
25. Good Pattern
If at all possible:
+ Avoid concurrent updates to the same key or;
+ Assign an unique UUID or TimeUUID for the key or;
+ Accept some lost writes
If impossible:
+ Don't try to fix it yourself
+ Use Paxos via Lightweight Transactions
Timestamp Conflict Resolution Getting the Most out of Lightweight Transactions in ScyllaDB
27. FAQ
+ I deleted data but disk space hasn't been released yet?
+ I deleted data and latency skyrocketed, why?
+ I am quite certain I deleted that data, how come it is back?