SlideShare a Scribd company logo
Brought to you by
Avoiding Data Hotspots
At Scale
Konstantin Osipov
Engineering at
Konstantin Osipov
Director of Engineering
■ Worked on lightweight transactions in Scylla
■ Rarely happy with the status quo (AKA the stubborn one)
■ A very happy father
■ Career and public speaking coach
RUM conjecture and scalability
What this talk is not
● replication
● Re-sharding and re-balancing data
● distributed queries & jobs
will focus on principles data distribution only
Ways to shard
Define sharding
Sharding - horizontal partitioning of data across multiple servers. Can be used to
scale capacity and (possibly) throughput of the database. 3 key challenges:
● Choosing a way to split data across nodes
● Re-balancing data and maintaining location information
● Routing queries to the data
Hash based sharding
Hash
ring
Hashed keys
Consistent hash Ketama hash
Sharding: hash + virtual buckets in Couchbase
Sharding: chunk splits and migrations in
MongoDB
Hotspots
Range based sharding
Sharding: ranges in CockroachDB
mongodb
For queries that don’t include the shard key, mongos must query all shards, wait
for their response and then return the result to the application. These
“scatter/gather” queries can be long running operations.
However, range based partitioning can result in an uneven distribution of data,
which may negate some of the benefits of sharding. For example, if the shard key
is a linearly increasing field, such as time, then all requests for a given time range
will map to the same chunk, and thus the same shard. In this situation, a small set
of shards may receive the majority of requests and the system would not scale
very well.
spanner
One cause of hotspots is having a column whose value monotonically increases
as the first key part, because this results in all inserts occurring at the end of your
key space. This pattern is undesirable because Cloud Spanner divides data among
servers by key ranges, which means all your inserts will be directed at a single
server that will end up doing all the work.
Avoiding hotspots
Bit-reversing the partition key
Descending order for timestamp-based keys
CREATE TABLE UserAccessLog (
UserId INT64 NOT NULL,
LastAccess TIMESTAMP NOT NULL,
...
) PRIMARY KEY (UserId, LastAccess DESC);
Replicating dimension tables everywhere
voltdb
To further optimize performance, VoltDB allows selected tables to be replicated
on all partitions of the cluster. This strategy minimizes cross-partition join
operations. For example, a retail merchandising database that uses product codes
as the primary key may have one table that simply correlates the product code
with the product's category and full name. Since this table is relatively small and
does not change frequently (unlike inventory and orders) it can be replicated to all
partitions. This way stored procedures can retrieve and return user-friendly
product information when searching by product code without impacting the
performance of order and inventory updates and searches.
Good and bad shard keys
■ good: user session, shopping order
■ maybe: user_id (if user data isn’t too thick)
■ Better: (user_id, post_id)
■ bad: inventory item, order date
Special cases
Scaling a message queue
Scaling in a data warehouse
■ Data warehouses usually don’t check unique constraints
■ Data is sorted multiple times, according to multiple dimensions
■ Sharding can be done according to a hash of multiple fields
Let’s recap
Summary: design choices
Hash Range
Write heavy/monotonic//time
series
Linear scaling Hotspots
Primary key read Linear scaling Linear scaling
Partial key read Hotspots Linear scaling
Indexed range read Hotspots Linear scaling
Non-indexed read Hotspots Hotspots
Brought to you by
Konstantin Osipov
kostja@scylladb.com
@kostja_osipov

More Related Content

What's hot

Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
ScyllaDB
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
Romain Hardouin
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
ScyllaDB
 
Update on Crimson - the Seastarized Ceph - Seastar Summit
Update on Crimson  - the Seastarized Ceph - Seastar SummitUpdate on Crimson  - the Seastarized Ceph - Seastar Summit
Update on Crimson - the Seastarized Ceph - Seastar Summit
ScyllaDB
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
Gluster.org
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
ScyllaDB
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
ScyllaDB
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
ScyllaDB
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
ScyllaDB
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
ScyllaDB
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
Gluster.org
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
DevOpsDays Tel Aviv
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpana
Gluster.org
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
Gluster.org
 
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu ChaiRADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
Ceph Community
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
ScyllaDB
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Ceph Community
 
Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!
ScyllaDB
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in Rust
ScyllaDB
 

What's hot (20)

Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
 
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
Scylla Summit 2018: Rebuilding the Ceph Distributed Storage Solution with Sea...
 
Update on Crimson - the Seastarized Ceph - Seastar Summit
Update on Crimson  - the Seastarized Ceph - Seastar SummitUpdate on Crimson  - the Seastarized Ceph - Seastar Summit
Update on Crimson - the Seastarized Ceph - Seastar Summit
 
State of Gluster Performance
State of Gluster PerformanceState of Gluster Performance
State of Gluster Performance
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Object Compaction in Cloud for High Yield
Object Compaction in Cloud for High YieldObject Compaction in Cloud for High Yield
Object Compaction in Cloud for High Yield
 
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
How we got to 1 millisecond latency in 99% under repair, compaction, and flus...
 
How to be Successful with Scylla
How to be Successful with ScyllaHow to be Successful with Scylla
How to be Successful with Scylla
 
Life as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan RossiLife as a GlusterFS Consultant with Ivan Rossi
Life as a GlusterFS Consultant with Ivan Rossi
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpana
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu ChaiRADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
RADOS improvements and roadmap - Greg Farnum, Josh Durgin, Kefu Chai
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...Keeping Latency Low and Throughput High with Application-level Priority Manag...
Keeping Latency Low and Throughput High with Application-level Priority Manag...
 
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
 
Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!Is It Faster to Go with Redpanda Transactions than Without Them?!
Is It Faster to Go with Redpanda Transactions than Without Them?!
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in Rust
 

Similar to What We Need to Unlearn about Persistent Storage

Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Amazon Web Services
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS
Clustrix
 
DB2 LUW V11.1 CERTIFICATION TRAINING PART #1
DB2 LUW V11.1 CERTIFICATION TRAINING PART #1DB2 LUW V11.1 CERTIFICATION TRAINING PART #1
DB2 LUW V11.1 CERTIFICATION TRAINING PART #1
sunildupakuntla
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
Billy Newport
 
CS636-olap.ppt
CS636-olap.pptCS636-olap.ppt
CS636-olap.ppt
Iftikharbaig7
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
Amazon Web Services
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
Carol McDonald
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
SnapLogic
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testingsmittal81
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
HBaseCon
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql serverChris Adkin
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Amazon Web Services
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
Kel Graham
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designCalpont
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of data
Piyush Katariya
 
GIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLsGIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLs
techmaddy
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
政宏 张
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
Amazon Web Services
 

Similar to What We Need to Unlearn about Persistent Storage (20)

Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS Beyond Aurora. Scale-out SQL databases for AWS
Beyond Aurora. Scale-out SQL databases for AWS
 
DB2 LUW V11.1 CERTIFICATION TRAINING PART #1
DB2 LUW V11.1 CERTIFICATION TRAINING PART #1DB2 LUW V11.1 CERTIFICATION TRAINING PART #1
DB2 LUW V11.1 CERTIFICATION TRAINING PART #1
 
Enterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison PillEnterprise NoSQL: Silver Bullet or Poison Pill
Enterprise NoSQL: Silver Bullet or Poison Pill
 
CS636-olap.ppt
CS636-olap.pptCS636-olap.ppt
CS636-olap.ppt
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Mohan Testing
Mohan TestingMohan Testing
Mohan Testing
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce Argus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
Building scalable application with sql server
Building scalable application with sql serverBuilding scalable application with sql server
Building scalable application with sql server
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
A tour of Amazon Redshift
A tour of Amazon RedshiftA tour of Amazon Redshift
A tour of Amazon Redshift
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of data
 
GIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLsGIDS 2016 Understanding and Building No SQLs
GIDS 2016 Understanding and Building No SQLs
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data AnalyticsAWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
AWS June 2016 Webinar Series - Amazon Redshift or Big Data Analytics
 

More from ScyllaDB

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
ScyllaDB
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
ScyllaDB
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
ScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
ScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
ScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
ScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
ScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
ScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
ScyllaDB
 

More from ScyllaDB (20)

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 

Recently uploaded

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 

Recently uploaded (20)

GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 

What We Need to Unlearn about Persistent Storage

  • 1. Brought to you by Avoiding Data Hotspots At Scale Konstantin Osipov Engineering at
  • 2. Konstantin Osipov Director of Engineering ■ Worked on lightweight transactions in Scylla ■ Rarely happy with the status quo (AKA the stubborn one) ■ A very happy father ■ Career and public speaking coach
  • 3. RUM conjecture and scalability
  • 4. What this talk is not ● replication ● Re-sharding and re-balancing data ● distributed queries & jobs will focus on principles data distribution only
  • 6. Define sharding Sharding - horizontal partitioning of data across multiple servers. Can be used to scale capacity and (possibly) throughput of the database. 3 key challenges: ● Choosing a way to split data across nodes ● Re-balancing data and maintaining location information ● Routing queries to the data
  • 7. Hash based sharding Hash ring Hashed keys Consistent hash Ketama hash
  • 8. Sharding: hash + virtual buckets in Couchbase
  • 9. Sharding: chunk splits and migrations in MongoDB
  • 12. Sharding: ranges in CockroachDB
  • 13. mongodb For queries that don’t include the shard key, mongos must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations. However, range based partitioning can result in an uneven distribution of data, which may negate some of the benefits of sharding. For example, if the shard key is a linearly increasing field, such as time, then all requests for a given time range will map to the same chunk, and thus the same shard. In this situation, a small set of shards may receive the majority of requests and the system would not scale very well.
  • 14. spanner One cause of hotspots is having a column whose value monotonically increases as the first key part, because this results in all inserts occurring at the end of your key space. This pattern is undesirable because Cloud Spanner divides data among servers by key ranges, which means all your inserts will be directed at a single server that will end up doing all the work.
  • 17. Descending order for timestamp-based keys CREATE TABLE UserAccessLog ( UserId INT64 NOT NULL, LastAccess TIMESTAMP NOT NULL, ... ) PRIMARY KEY (UserId, LastAccess DESC);
  • 19. voltdb To further optimize performance, VoltDB allows selected tables to be replicated on all partitions of the cluster. This strategy minimizes cross-partition join operations. For example, a retail merchandising database that uses product codes as the primary key may have one table that simply correlates the product code with the product's category and full name. Since this table is relatively small and does not change frequently (unlike inventory and orders) it can be replicated to all partitions. This way stored procedures can retrieve and return user-friendly product information when searching by product code without impacting the performance of order and inventory updates and searches.
  • 20. Good and bad shard keys ■ good: user session, shopping order ■ maybe: user_id (if user data isn’t too thick) ■ Better: (user_id, post_id) ■ bad: inventory item, order date
  • 23. Scaling in a data warehouse ■ Data warehouses usually don’t check unique constraints ■ Data is sorted multiple times, according to multiple dimensions ■ Sharding can be done according to a hash of multiple fields
  • 25. Summary: design choices Hash Range Write heavy/monotonic//time series Linear scaling Hotspots Primary key read Linear scaling Linear scaling Partial key read Hotspots Linear scaling Indexed range read Hotspots Linear scaling Non-indexed read Hotspots Hotspots
  • 26. Brought to you by Konstantin Osipov kostja@scylladb.com @kostja_osipov