From Postgres to ScyllaDB: Migration Strategies and Performance Gains

From Postgres to ScyllaDB:
Migration Strategies and
Performance Gains
Dan Harris & Sebastian Vercruysse

Company & Presenters
Dan Harris Sebastian Vercruysse

■ DataPrime Query Engine and metastore
■ ScyllaDB implementation
■ Conclusion
Agenda

Dataprime Query Engine
■ Custom distributed query engine for proprietary query language (DataPrime)
on arbitrary semi-structured data
■ Querying data stored in object storage
■ Storage format is specialized parquet ﬁles

■ Reading parquet metadata from object storage is too expensive for large
queries
■ Move metadata into separate (faster) storage
■ Block listing with bloom ﬁlters
■ Transactional commit log
Metastore: Motivation

Requirements for metastore
■ Low latency
■ Scaleable
■ Transactional guarantees
Initial implementation: postgres
Metastore: Requirements
Example one (large) customer:
■ 2k parquet ﬁles / hour
■ 50k parquet ﬁles / day
■ 15 TB of data / day
■ 20 GB of metadata / day

Important for listing: give me all the blocks for a table in a given time range
Example:
■ Block url:
s3://cgx-production-c4c-archive-data/cx/parquet/v1/team_id=555585/…
…dt=2022-12-02/hr=10/0246f9e9-f0da-4723-9b64-a12346095d25.parquet
■ Row group: 0, 1, 2 …
■ Min timestamp
■ Max timestamp
■ Number of rows
■ Total size
■ …
Blocks

Bloom Filters
Used for pruning blocks when filtering by search term
■ is a given token maybe in this block or definitely not?
Works by hashing tokens multiple times & setting bits to 1. When checking, hash
again and check bits are all 1.
Specifically using blocked bloom filter (sequence of bloom filters):
8192 * 32 bytes

Column Metadata
Per-column parquet metadata required for scanning and decoding
parquet ﬁle
Example:
■ Block URL
■ Row Group
■ Column Name
■ Column metadata (blob)

Blocks
Example:
■ s3://cgx-production-c4c-archive-data/cx/parquet/v1/team_id=555585/…
…dt=2022-12-02/hr=10/0246f9e9-f0da-4723-9b64-a12346095d25.parquet
What should the primary key be?
■ Table url?
■ ((Block url, row group))?
■ ((Table url, hour))?
■ ((Table url, hour), block url, row group)

Bloom Filters
Problem: how to verify bits are set?
Solution: read bloom filters and process in application
Problem: ~50k blocks/day * 262kB = ~12GB of data, too much for one query
Solution:
■ chunk bloom filters and split into rows
■ by chunking per bloom filter block we read one row per token,
50k * 32 bytes / token = 1.6MB / token

Bloom Filters: Primary Key (1)
Primary key: ((block_url, row_group), chunk index)
~ 8192 chunks of 32 bytes per bloom ﬁlter = ~262kB per partition
Pros:
■ Easy to insert and delete, single batch query
Cons:
■ Need to know the block id before reading
■ A lot of partitions to access, 1 day: 50k partitions

Bloom Filters: Primary Key (2)
Primary key: ((table url, hour, chunk index), block url, row group)
~ 2000 chunks of 32 bytes per bloom ﬁlter = ~64kB per partition
Pros
■ Very fast listing, less partitions.
1 day, 5 tokens: 24 * 5 = 120 partitions
■ No dependency on block, can read in parallel
Cons
■ Expensive to insert and delete: 8192 partitions for a single block!

Bloom Filters: Future Approach
Investigate optimal chunking:
find middle ground between writing large enough chunks and reading unnecessary data
Can we use UDF’s with WebAssembly?
SELECT block_url, row_group
FROM bloom_filters
WHERE table_url = ? AND hour = ? AND bloom_filter_matches(bloom_filter, indexes)
■ Let ScyllaDB do the hard work
■ Don’t need to worry about amount of data we’re sending back to app
■ Code is already written in rust

Be Careful
It’s very much not SQL - try to avoid migrations (/bugs)
Solutions:
■ Rename columns?
■ Add new columns, UPDATE blocks SET query?
■ Truncate table and start over again

ScyllaDB: Ecosystem
Extensive usage of ScyllaDB libraries and components:
■ Written in rust on top of ScyllaDB-rust-driver
■ ScyllaDB Operator for k8s
■ ScyllaDB Monitoring
■ ScyllaDB Manager
From knowing ScyllaDB exists to production ready & terabytes of data in 2 months

Hardware
Cost is very important
3-node cluster:
■ 8 vCPU
■ 32 GiB memory
■ ARM/Graviton
■ EBS volumes (gp3)
■ 500 MBps bandwidth
■ 12k IOPS

Metastore: Block Listing
Largest cluster: 4-5 TB on each node, mostly for one customer
Writes:
■ p99 latency: <1ms
■ ~10k writes / s
Block listing:
■ Depends on query & whether we’re using bloom ﬁlters
■ for 1 hour: <20ms latency
■ for 1 day: <500ms latency

Metastore: Column Metadata
Reads:
■ p50 latency: 5ms
■ p99 latency: 100ms (when we timeout)
Issue: large amount of concurrent queries
Probably disk issue

Conclusion
■ Keep an eye on partition sizes
■ Think about read/write patterns
■ Very happy with block listing…
… but unpredictable tail latency for reading
column metadata
■ Probably shouldn’t use EBS :-)

Thank You
Stay in Touch
Dan Harris
dan@coralogix.com
@thinkharderdev
github.com/thinkharderdev
www.linkedin.com/in/dsh2va
github.com/sebver
sebastian.vercruysse@coralogix.com
www.linkedin.com/in/sebastian-vercruysse
Sebastian Vercruysse

From Postgres to ScyllaDB: Migration Strategies and Performance Gains

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to From Postgres to ScyllaDB: Migration Strategies and Performance Gains

Similar to From Postgres to ScyllaDB: Migration Strategies and Performance Gains (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

From Postgres to ScyllaDB: Migration Strategies and Performance Gains