Brought to you by
Avoiding Data Hotspots
At Scale
Konstantin Osipov
Engineering at
Konstantin Osipov
Director of Engineering
■ Worked on lightweight transactions in Scylla
■ Rarely happy with the status quo (AKA the stubborn one)
■ A very happy father
■ Career and public speaking coach
RUM conjecture and scalability
What this talk is not
● replication
● Re-sharding and re-balancing data
● distributed queries & jobs
will focus on principles data distribution only
Ways to shard
Define sharding
Sharding - horizontal partitioning of data across multiple servers. Can be used to
scale capacity and (possibly) throughput of the database. 3 key challenges:
● Choosing a way to split data across nodes
● Re-balancing data and maintaining location information
● Routing queries to the data
Hash based sharding
Hash
ring
Hashed keys
Consistent hash Ketama hash
Sharding: hash + virtual buckets in Couchbase
Sharding: chunk splits and migrations in
MongoDB
Hotspots
Range based sharding
Sharding: ranges in CockroachDB
mongodb
For queries that don’t include the shard key, mongos must query all shards, wait
for their response and then return the result to the application. These
“scatter/gather” queries can be long running operations.
However, range based partitioning can result in an uneven distribution of data,
which may negate some of the benefits of sharding. For example, if the shard key
is a linearly increasing field, such as time, then all requests for a given time range
will map to the same chunk, and thus the same shard. In this situation, a small set
of shards may receive the majority of requests and the system would not scale
very well.
spanner
One cause of hotspots is having a column whose value monotonically increases
as the first key part, because this results in all inserts occurring at the end of your
key space. This pattern is undesirable because Cloud Spanner divides data among
servers by key ranges, which means all your inserts will be directed at a single
server that will end up doing all the work.
Avoiding hotspots
Bit-reversing the partition key
Descending order for timestamp-based keys
CREATE TABLE UserAccessLog (
UserId INT64 NOT NULL,
LastAccess TIMESTAMP NOT NULL,
...
) PRIMARY KEY (UserId, LastAccess DESC);
Replicating dimension tables everywhere
voltdb
To further optimize performance, VoltDB allows selected tables to be replicated
on all partitions of the cluster. This strategy minimizes cross-partition join
operations. For example, a retail merchandising database that uses product codes
as the primary key may have one table that simply correlates the product code
with the product's category and full name. Since this table is relatively small and
does not change frequently (unlike inventory and orders) it can be replicated to all
partitions. This way stored procedures can retrieve and return user-friendly
product information when searching by product code without impacting the
performance of order and inventory updates and searches.
Good and bad shard keys
■ good: user session, shopping order
■ maybe: user_id (if user data isn’t too thick)
■ Better: (user_id, post_id)
■ bad: inventory item, order date
Special cases
Scaling a message queue
Scaling in a data warehouse
■ Data warehouses usually don’t check unique constraints
■ Data is sorted multiple times, according to multiple dimensions
■ Sharding can be done according to a hash of multiple fields
Let’s recap
Summary: design choices
Hash Range
Write heavy/monotonic//time
series
Linear scaling Hotspots
Primary key read Linear scaling Linear scaling
Partial key read Hotspots Linear scaling
Indexed range read Hotspots Linear scaling
Non-indexed read Hotspots Hotspots
Brought to you by
Konstantin Osipov
kostja@scylladb.com
@kostja_osipov

What We Need to Unlearn about Persistent Storage

  • 1.
    Brought to youby Avoiding Data Hotspots At Scale Konstantin Osipov Engineering at
  • 2.
    Konstantin Osipov Director ofEngineering ■ Worked on lightweight transactions in Scylla ■ Rarely happy with the status quo (AKA the stubborn one) ■ A very happy father ■ Career and public speaking coach
  • 3.
    RUM conjecture andscalability
  • 4.
    What this talkis not ● replication ● Re-sharding and re-balancing data ● distributed queries & jobs will focus on principles data distribution only
  • 5.
  • 6.
    Define sharding Sharding -horizontal partitioning of data across multiple servers. Can be used to scale capacity and (possibly) throughput of the database. 3 key challenges: ● Choosing a way to split data across nodes ● Re-balancing data and maintaining location information ● Routing queries to the data
  • 7.
    Hash based sharding Hash ring Hashedkeys Consistent hash Ketama hash
  • 8.
    Sharding: hash +virtual buckets in Couchbase
  • 9.
    Sharding: chunk splitsand migrations in MongoDB
  • 10.
  • 11.
  • 12.
  • 13.
    mongodb For queries thatdon’t include the shard key, mongos must query all shards, wait for their response and then return the result to the application. These “scatter/gather” queries can be long running operations. However, range based partitioning can result in an uneven distribution of data, which may negate some of the benefits of sharding. For example, if the shard key is a linearly increasing field, such as time, then all requests for a given time range will map to the same chunk, and thus the same shard. In this situation, a small set of shards may receive the majority of requests and the system would not scale very well.
  • 14.
    spanner One cause ofhotspots is having a column whose value monotonically increases as the first key part, because this results in all inserts occurring at the end of your key space. This pattern is undesirable because Cloud Spanner divides data among servers by key ranges, which means all your inserts will be directed at a single server that will end up doing all the work.
  • 15.
  • 16.
  • 17.
    Descending order fortimestamp-based keys CREATE TABLE UserAccessLog ( UserId INT64 NOT NULL, LastAccess TIMESTAMP NOT NULL, ... ) PRIMARY KEY (UserId, LastAccess DESC);
  • 18.
  • 19.
    voltdb To further optimizeperformance, VoltDB allows selected tables to be replicated on all partitions of the cluster. This strategy minimizes cross-partition join operations. For example, a retail merchandising database that uses product codes as the primary key may have one table that simply correlates the product code with the product's category and full name. Since this table is relatively small and does not change frequently (unlike inventory and orders) it can be replicated to all partitions. This way stored procedures can retrieve and return user-friendly product information when searching by product code without impacting the performance of order and inventory updates and searches.
  • 20.
    Good and badshard keys ■ good: user session, shopping order ■ maybe: user_id (if user data isn’t too thick) ■ Better: (user_id, post_id) ■ bad: inventory item, order date
  • 21.
  • 22.
  • 23.
    Scaling in adata warehouse ■ Data warehouses usually don’t check unique constraints ■ Data is sorted multiple times, according to multiple dimensions ■ Sharding can be done according to a hash of multiple fields
  • 24.
  • 25.
    Summary: design choices HashRange Write heavy/monotonic//time series Linear scaling Hotspots Primary key read Linear scaling Linear scaling Partial key read Hotspots Linear scaling Indexed range read Hotspots Linear scaling Non-indexed read Hotspots Hotspots
  • 26.
    Brought to youby Konstantin Osipov kostja@scylladb.com @kostja_osipov