SlideShare a Scribd company logo
1 of 35
Download to read offline
Dissecting Real-World
Database Performance
Dilemmas
Felipe Mendes, Solution Architect at ScyllaDB
Why Disney
Moved from
DynamoDB
to ScyllaDB
Yichen Wei, Senior
Software Engineer
Adam Drennan, Senior
Software Engineer
So You've
Lost Quorum:
Lessons From
Accidental
Downtime
Bo Ingram, Staff Software
Engineer, Persistence
Infrastructure
Inside Expedia's
Migration to
ScyllaDB for
Change Data
Capture
Jean Carlo Rivera Ura,
NoSQL Database Engineer III
MongoDB
vs ScyllaDB:
Tractian’s
Experience with
Real-Time ML
JP Voltani, Director of
Engineering
Poll
How much time did you spend
troubleshooting performance issues
in your database?
+ For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
4
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
5
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
Hi! Nice to e-Meet Ya!
Felipe Mendes, Solution Architect at ScyllaDB
+ Published Author on Linux and Databases
+ Helps teams solve their most challenging problems
+ Years of experience with Linux and distributed systems
Agenda + Hunting down a Latency Problem
+ Ticking Time-Series Bomb
+ Hot Facts, Cold Insights
+ Holistic Performance View
7
Hunting Down a Latency
Problem
Uncovering a Multi-Region Performance Challenge
8
Customer evaluated ScyllaDB and was happy with the results
+ Initial testing:
+ 3 node cluster (AWS us-east1)
+ P99 < 15ms
+ Using gocql driver:
+ Following query best-practices
+ Making use of gocql.DataCentreHostFilter
+ All application queries using a LOCAL_* ConsistencyLevel
9
A Latency Sensitive Workload for AdTech
Final production requirements were multi-region (AWS us-west-1)
+ Followed the Adding a New DC to an ScyllaDB Cluster procedure
+ Latency went through the roof!
10
The PROBLEM
11
Why?!
+ We realized that latencies only affected a specific ScyllaDB scheduling class
12
A Few Data Points
+ A path for resolution: main query class P99 driven by the Network RTT
+ nodetool setlogginglevel query_processing trace:
13
Whoops! Tracking down individual queries
+ Seems like we are getting close. Ideas?
for i in *; do egrep -Hi "SELECT|UPDATE|INSERT|DELETE" $i | awk -F':' '{ print $1, $NF }'; done | sort -V | uniq -c
215 (cached) "SELECT * FROM system_auth.roles WHERE role = ?" (cassandra)
1 (cached) "SELECT * FROM system_distributed.service_levels;" ()
304 (cached) "SELECT * FROM system_auth.roles WHERE role = ?" (cassandra)
3 (cached) "SELECT * FROM system_distributed.service_levels;" ()
323 (cached) "SELECT * FROM system_auth.roles WHERE role = ?" (cassandra)
2 (cached) "SELECT * FROM system_distributed.service_levels;" ()
1 (cached) "SELECT id, data, written_at, version FROM system.batchlog LIMIT 128" ()
7 (cached) "SELECT * FROM system_distributed.service_levels;" ()
1 (cached) "SELECT id, data, written_at, version FROM system.batchlog LIMIT 128" ()
6 (cached) "SELECT * FROM system_distributed.service_levels;" ()
1 (cached) "SELECT id, data, written_at, version FROM system.batchlog LIMIT 128" ()
6 (cached) "SELECT * FROM system_distributed.service_levels;" ()
+ From ScyllaDB:
14
The answer lies within the code (and in our docs!)
+ From Enable Authorization:
db::consistency_level
password_authenticator::consistency_for_user(std::string_view role_name) {
if (role_name == DEFAULT_USER_NAME) {
return db::consistency_level::QUORUM;
}
return db::consistency_level::LOCAL_ONE;
}
Ticking Time-series bomb
Rushing against time
15
Log retention use case in production
+ ScyllaDB Cloud cluster in GCP
+ 6 node cluster
+ Latency within bounds, except during repairs
16
Major Streaming Company
Cluster repairs were taking long to complete
+ Some nodes started to run out of disk space
+ Customer was under a time-sensitive "freeze" period
17
The PROBLEM
4% free space!
+ Compaction was unable to keep up with the rate of incoming files:
18
An interesting symptom
19
What would you do?
+ What we know thus far:
+ Nodes are running out of disk space
+ Compaction is unable to catch up
+ Repair is taking long
20
Data Modeling Review
+ Making use Jaeger as an integration:
+ Business-critical application metrics
+ Time to Live (TTL) for automatic data expiration
+ Spreads data using a "bucket" technique, sorted by time
+ Circling back to the basics:
21
Data Modeling Review
$ cqlsh -e "DESC SCHEMA" | egrep -i "compaction =" | sort -h | uniq -c
43 AND compaction = {'class': 'IncrementalCompactionStrategy'}
33 AND compaction = {'class': 'SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
55 AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1',
'compaction_window_unit': 'HOURS'}
+ 131 tables to run through … 😢
+ … but TWCS tables seem interesting 💪
+ TTL 259200 (in seconds) == 30 days
+ Split in 1 hour buckets
+ 30 * 24 buckets = 720 windows!
22
Diving Deeper
CREATE TABLE x.y (
<...>
) WITH crc_check_chance = 1.0
AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1',
'compaction_window_unit': 'HOURS'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND default_time_to_live = 2592000
AND gc_grace_seconds = 10800
23
Understanding the problem
TWCS
Picture from: https://www.pythian.com/blog/proposal-for-a-new-cassandra-cluster-key-compaction-strategy
24
After schema changes
+ Repair completed, memory utilization reduced, performance improved!
25
+ Customer explained they were using Jaeger integration "stock" settings
+ We reported and addressed jaegertracing/jaeger/4561 upstream:
+ "With ScyllaDB (and likely Cassandra too), increasing TTL leads to large numbers of
sstables"
+ Felipe introduced the "twcs_max_window_count" tunable in ScyllaDB:
+ Safemode - Introduce TimeWindowCompactionStrategy Guardrails
+ Available starting at ScyllaDB Open Source 5.2 & Enterprise 2023.1
+ Takeaways
+ Be suspicious of activities taking a long time to complete
+ Review how third-party integrations works under the hood
+ Rest assured with ScyllaDB Cloud expertise to back you up!
Post event findings (and diligence)
Hot Facts, Cold Insights
Data Thermodynamics & ScyllaDB
26
Engaged with us after losing data in HBase
+ On-prem bare-metal deployment
+ HUGE storage footprint – Petabyte range
+ Tiered storage or similar requirement:
+ "hot" – frequently accessed data or;
+ "cold" – least accessed data
+ No retention periods, data is forever stored
27
Worldwide Feed Aggregator
Which deployment and replication strategies to follow?
+ Single cluster, dual hot/cold DC?
+ Separate clusters, observable replication?
+ Should we use CDC for replication?
+ To dual-write or not?
+ How to evict data from the "hot" cluster?
28
Challenges
29
Let's talk
strategies
Reasons were plenty:
+ An on-premise deployment makes it fair difficult to effectively leverage it
+ Decommissioning the previous HBase infrastructure wasn't an option
+ Performance:
+ "Local" Object Storage latencies were suboptimal
+ Cloud ones would result in:
+ Network RTT penalty
+ Internet traffic ($$)
+ Need to come up with data tiering/replication strategies anyway…
+ … and still figure out how to enable their applications to work with an Object
Store
30
Why not just use Object Storage?
Plan-ahead:
+ ScyllaDB specialized cache fully allocates server's memory (LSA)
+ Important on-disk components (SSTable metadata) needs to be stored
+ 1:30 for performance, 1:100 as an upper bound limit
31
Memory and Storage Limits
Ultimately, they decided to apply the following strategy:
+ Application primarily writes to "hot" DC;
+ Replication from "hot" to "cold" is asynchronously handled by ScyllaDB
+ After their cut-off period, simply remove all replicas from the "hot" DC:
32
ScyllaDB Replication
ALTER KEYSPACE replicated_keyspace WITH replication = {'class':
'NetworkTopologyStrategy', 'hot': 0, 'cold': 3};
+ Similar strategy already employed in HBase, zero friction point
Performance requires an holistic view:
+ An overlooked setting may be the culprit
to performance problems
+ Integrations won't always follow best practices
+ Be sure to take the most of your underlying
(and available) infrastructure
+ Context is fundamental to guide your decisions
33
Summary
Q&A
ScyllaDB Cloud
Start free trial
scylladb.com/cloud
Feb 14-15 | VIRTUAL EVENT
scylladb.com/summit
Virtual Workshop
February 22, 2024
scylladb.com/events
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

More Related Content

Similar to Dissecting Real-World Database Performance Dilemmas

Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?SegFaultConf
 
Upgrading AD from Windows Server 2003 to Windows Server 2008 R2
Upgrading AD from Windows Server 2003 to Windows Server 2008 R2Upgrading AD from Windows Server 2003 to Windows Server 2008 R2
Upgrading AD from Windows Server 2003 to Windows Server 2008 R2Amit Gatenyo
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSDataStax Academy
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗YUCHENG HU
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityScyllaDB
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark DownscalingDatabricks
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
Scylla Virtual Workshop 2022
Scylla Virtual Workshop 2022Scylla Virtual Workshop 2022
Scylla Virtual Workshop 2022ScyllaDB
 
Webinar: How to build a highly available time series solution with KairosDB
Webinar: How to build a highly available time series solution with KairosDBWebinar: How to build a highly available time series solution with KairosDB
Webinar: How to build a highly available time series solution with KairosDBScyllaDB
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)Julia Angell
 
Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows ScyllaDB
 
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center ZurichData Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center ZurichRomeo Kienzler
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...ScyllaDB
 
Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019
Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019
Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019Alex Thissen
 

Similar to Dissecting Real-World Database Performance Dilemmas (20)

Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
Robert Pankowecki - Czy sprzedawcy SQLowych baz nas oszukali?
 
Upgrading AD from Windows Server 2003 to Windows Server 2008 R2
Upgrading AD from Windows Server 2003 to Windows Server 2008 R2Upgrading AD from Windows Server 2003 to Windows Server 2008 R2
Upgrading AD from Windows Server 2003 to Windows Server 2008 R2
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
2012 09 MariaDB Boston Meetup - MariaDB 是 Mysql 的替代者吗
 
Cassandra at teads
Cassandra at teadsCassandra at teads
Cassandra at teads
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availability
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
Big Data Tools in AWS
Big Data Tools in AWSBig Data Tools in AWS
Big Data Tools in AWS
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Scylla Virtual Workshop 2022
Scylla Virtual Workshop 2022Scylla Virtual Workshop 2022
Scylla Virtual Workshop 2022
 
Webinar: How to build a highly available time series solution with KairosDB
Webinar: How to build a highly available time series solution with KairosDBWebinar: How to build a highly available time series solution with KairosDB
Webinar: How to build a highly available time series solution with KairosDB
 
Webinar how to build a highly available time series solution with kairos-db (1)
Webinar  how to build a highly available time series solution with kairos-db (1)Webinar  how to build a highly available time series solution with kairos-db (1)
Webinar how to build a highly available time series solution with kairos-db (1)
 
Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows Exploring Phantom Traffic Jams in Your Data Flows
Exploring Phantom Traffic Jams in Your Data Flows
 
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center ZurichData Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
 
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
AdGear Use Case with Scylla - 1M Queries Per Second with Single-Digit Millise...
 
Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019
Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019
Architecting .NET solutions in a Docker ecosystem - .NET Fest Kyiv 2019
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesScyllaDB
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversScyllaDB
 
Overcoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQLOvercoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQLScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling Mistakes
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 
Overcoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQLOvercoming Media Streaming Challenges with NoSQL
Overcoming Media Streaming Challenges with NoSQL
 

Recently uploaded

APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Dissecting Real-World Database Performance Dilemmas

  • 1. Dissecting Real-World Database Performance Dilemmas Felipe Mendes, Solution Architect at ScyllaDB
  • 2. Why Disney Moved from DynamoDB to ScyllaDB Yichen Wei, Senior Software Engineer Adam Drennan, Senior Software Engineer So You've Lost Quorum: Lessons From Accidental Downtime Bo Ingram, Staff Software Engineer, Persistence Infrastructure Inside Expedia's Migration to ScyllaDB for Change Data Capture Jean Carlo Rivera Ura, NoSQL Database Engineer III MongoDB vs ScyllaDB: Tractian’s Experience with Real-Time ML JP Voltani, Director of Engineering
  • 3. Poll How much time did you spend troubleshooting performance issues in your database?
  • 4. + For data-intensive applications that require high throughput and predictable low latencies + Close-to-the-metal design takes full advantage of modern infrastructure + >5x higher throughput + >20x lower latency + >75% TCO savings + Compatible with Apache Cassandra and Amazon DynamoDB + DBaaS/Cloud, Enterprise and Open Source solutions The Database for Gamechangers 4 “ScyllaDB stands apart...It’s the rare product that exceeds my expectations.” – Martin Heller, InfoWorld contributing editor and reviewer “For 99.9% of applications, ScyllaDB delivers all the power a customer will ever need, on workloads that other databases can’t touch – and at a fraction of the cost of an in-memory solution.” – Adrian Bridgewater, Forbes senior contributor
  • 5. 5 +400 Gamechangers Leverage ScyllaDB Seamless experiences across content + devices Digital experiences at massive scale Corporate fleet management Real-time analytics 2,000,000 SKU -commerce management Video recommendation management Threat intelligence service using JanusGraph Real time fraud detection across 6M transactions/day Uber scale, mission critical chat & messaging app Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Unified ML feature store across the business Cryptocurrency exchange app Geography-based recommendations Global operations- Avon, Body Shop + more Predictable performance for on sale surges GPS-based exercise tracking Serving dynamic live streams at scale Powering India's top social media platform Personalized advertising to players Distribution of game assets in Unreal Engine
  • 6. Hi! Nice to e-Meet Ya! Felipe Mendes, Solution Architect at ScyllaDB + Published Author on Linux and Databases + Helps teams solve their most challenging problems + Years of experience with Linux and distributed systems
  • 7. Agenda + Hunting down a Latency Problem + Ticking Time-Series Bomb + Hot Facts, Cold Insights + Holistic Performance View 7
  • 8. Hunting Down a Latency Problem Uncovering a Multi-Region Performance Challenge 8
  • 9. Customer evaluated ScyllaDB and was happy with the results + Initial testing: + 3 node cluster (AWS us-east1) + P99 < 15ms + Using gocql driver: + Following query best-practices + Making use of gocql.DataCentreHostFilter + All application queries using a LOCAL_* ConsistencyLevel 9 A Latency Sensitive Workload for AdTech
  • 10. Final production requirements were multi-region (AWS us-west-1) + Followed the Adding a New DC to an ScyllaDB Cluster procedure + Latency went through the roof! 10 The PROBLEM
  • 12. + We realized that latencies only affected a specific ScyllaDB scheduling class 12 A Few Data Points + A path for resolution: main query class P99 driven by the Network RTT
  • 13. + nodetool setlogginglevel query_processing trace: 13 Whoops! Tracking down individual queries + Seems like we are getting close. Ideas? for i in *; do egrep -Hi "SELECT|UPDATE|INSERT|DELETE" $i | awk -F':' '{ print $1, $NF }'; done | sort -V | uniq -c 215 (cached) "SELECT * FROM system_auth.roles WHERE role = ?" (cassandra) 1 (cached) "SELECT * FROM system_distributed.service_levels;" () 304 (cached) "SELECT * FROM system_auth.roles WHERE role = ?" (cassandra) 3 (cached) "SELECT * FROM system_distributed.service_levels;" () 323 (cached) "SELECT * FROM system_auth.roles WHERE role = ?" (cassandra) 2 (cached) "SELECT * FROM system_distributed.service_levels;" () 1 (cached) "SELECT id, data, written_at, version FROM system.batchlog LIMIT 128" () 7 (cached) "SELECT * FROM system_distributed.service_levels;" () 1 (cached) "SELECT id, data, written_at, version FROM system.batchlog LIMIT 128" () 6 (cached) "SELECT * FROM system_distributed.service_levels;" () 1 (cached) "SELECT id, data, written_at, version FROM system.batchlog LIMIT 128" () 6 (cached) "SELECT * FROM system_distributed.service_levels;" ()
  • 14. + From ScyllaDB: 14 The answer lies within the code (and in our docs!) + From Enable Authorization: db::consistency_level password_authenticator::consistency_for_user(std::string_view role_name) { if (role_name == DEFAULT_USER_NAME) { return db::consistency_level::QUORUM; } return db::consistency_level::LOCAL_ONE; }
  • 16. Log retention use case in production + ScyllaDB Cloud cluster in GCP + 6 node cluster + Latency within bounds, except during repairs 16 Major Streaming Company
  • 17. Cluster repairs were taking long to complete + Some nodes started to run out of disk space + Customer was under a time-sensitive "freeze" period 17 The PROBLEM 4% free space!
  • 18. + Compaction was unable to keep up with the rate of incoming files: 18 An interesting symptom
  • 20. + What we know thus far: + Nodes are running out of disk space + Compaction is unable to catch up + Repair is taking long 20 Data Modeling Review
  • 21. + Making use Jaeger as an integration: + Business-critical application metrics + Time to Live (TTL) for automatic data expiration + Spreads data using a "bucket" technique, sorted by time + Circling back to the basics: 21 Data Modeling Review $ cqlsh -e "DESC SCHEMA" | egrep -i "compaction =" | sort -h | uniq -c 43 AND compaction = {'class': 'IncrementalCompactionStrategy'} 33 AND compaction = {'class': 'SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} 55 AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS'} + 131 tables to run through … 😢 + … but TWCS tables seem interesting 💪
  • 22. + TTL 259200 (in seconds) == 30 days + Split in 1 hour buckets + 30 * 24 buckets = 720 windows! 22 Diving Deeper CREATE TABLE x.y ( <...> ) WITH crc_check_chance = 1.0 AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '1', 'compaction_window_unit': 'HOURS'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND default_time_to_live = 2592000 AND gc_grace_seconds = 10800
  • 23. 23 Understanding the problem TWCS Picture from: https://www.pythian.com/blog/proposal-for-a-new-cassandra-cluster-key-compaction-strategy
  • 24. 24 After schema changes + Repair completed, memory utilization reduced, performance improved!
  • 25. 25 + Customer explained they were using Jaeger integration "stock" settings + We reported and addressed jaegertracing/jaeger/4561 upstream: + "With ScyllaDB (and likely Cassandra too), increasing TTL leads to large numbers of sstables" + Felipe introduced the "twcs_max_window_count" tunable in ScyllaDB: + Safemode - Introduce TimeWindowCompactionStrategy Guardrails + Available starting at ScyllaDB Open Source 5.2 & Enterprise 2023.1 + Takeaways + Be suspicious of activities taking a long time to complete + Review how third-party integrations works under the hood + Rest assured with ScyllaDB Cloud expertise to back you up! Post event findings (and diligence)
  • 26. Hot Facts, Cold Insights Data Thermodynamics & ScyllaDB 26
  • 27. Engaged with us after losing data in HBase + On-prem bare-metal deployment + HUGE storage footprint – Petabyte range + Tiered storage or similar requirement: + "hot" – frequently accessed data or; + "cold" – least accessed data + No retention periods, data is forever stored 27 Worldwide Feed Aggregator
  • 28. Which deployment and replication strategies to follow? + Single cluster, dual hot/cold DC? + Separate clusters, observable replication? + Should we use CDC for replication? + To dual-write or not? + How to evict data from the "hot" cluster? 28 Challenges
  • 30. Reasons were plenty: + An on-premise deployment makes it fair difficult to effectively leverage it + Decommissioning the previous HBase infrastructure wasn't an option + Performance: + "Local" Object Storage latencies were suboptimal + Cloud ones would result in: + Network RTT penalty + Internet traffic ($$) + Need to come up with data tiering/replication strategies anyway… + … and still figure out how to enable their applications to work with an Object Store 30 Why not just use Object Storage?
  • 31. Plan-ahead: + ScyllaDB specialized cache fully allocates server's memory (LSA) + Important on-disk components (SSTable metadata) needs to be stored + 1:30 for performance, 1:100 as an upper bound limit 31 Memory and Storage Limits
  • 32. Ultimately, they decided to apply the following strategy: + Application primarily writes to "hot" DC; + Replication from "hot" to "cold" is asynchronously handled by ScyllaDB + After their cut-off period, simply remove all replicas from the "hot" DC: 32 ScyllaDB Replication ALTER KEYSPACE replicated_keyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'hot': 0, 'cold': 3}; + Similar strategy already employed in HBase, zero friction point
  • 33. Performance requires an holistic view: + An overlooked setting may be the culprit to performance problems + Integrations won't always follow best practices + Be sure to take the most of your underlying (and available) infrastructure + Context is fundamental to guide your decisions 33 Summary
  • 34. Q&A ScyllaDB Cloud Start free trial scylladb.com/cloud Feb 14-15 | VIRTUAL EVENT scylladb.com/summit Virtual Workshop February 22, 2024 scylladb.com/events
  • 35. Thank you for joining us today. @scylladb scylladb/ slack.scylladb.com @scylladb company/scylladb/ scylladb/