Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

How to Migrate a
Counter Table for 68
Billion Records
Robert Czupioł
Senior Platform Engineer
YOUR COMPANY  
LOGO HERE

Robert Czupioł
■ Cassandra Certified Expert since 2015
■ Introduce/Manage C*/Scylla in different companies
■ Attendee at first Scylla Summit 2016
■ and 2017, 2018… :)
Senior Platform Engineer
YOUR
COMPANY  
LOGO HERE
YOUR
PHOTO  
GOES HERE

■ Dating App
■ Top-3 in West Europe
■ +100M Customers
■ 9 Scylla Clusters (in past 16 Cassandra)
■ +200TB Data
■ avg 300k req/sec
Find the people you've crossed paths with

Decision
In May 2021, let migrate to ScyllaDB
■ Targets
• TCO
• Technical dept
• Data volumen
• Latency
• Monitoring

Crossings Cluster
■ Second biggest cluster
• 48 nodes (4 CPU, 32 GB, 1TB PD-SSD)
■ Dept
• Debian 8
• Uptime +580 days
• Cassandra 2.1
■ 8 active Tables
• Crossings_count

Type of migration
■ Offline
• Not that case
■ Online in 3 steps
• DW by ųService
• Leverage data
• Open bottle of Champaign

Different strategies
■ CSV
• CQLSH/DSBULK (Writetime issue)
■ SSTable
• SStableLoader (cluster stress)
• Nodetool refresh (imo best with Network Disks)
■ Dual Connect
• Scylla migrator (spark cluster)
• Own application

Counter table
■ Out of Idempotent rule
■ Only update
■ Weird delete approach
■ Different implementation in past (local and remote shards)
■ Without USING Timestamp
■ Without TTL
■ Not accurate

Counter table
■ Range of Long value

How counter works
■ Create dedicated node-counter-id (shard) for each node [RF=2]
Node
counter id
Shard’s
logical clock
Shard’s
value
A_1 0 0
B_1 1 1
NODE A
Node
counter id
Shard’s
logical clock
Shard’s
value
A_1 0 0
B_1 1 1
NODE B

Update operation
■ On node B increment by 2
■ Read the previous shard value
■ Generate the newest logical clock
■ Save new value and send to replica
Node
counter id
Shard’s
logical clock
Shard’s
value
A_1 0 0
B_1 2 3
NODE A
Node
counter id
Shard’s
logical clock
Shard’s
value
A_1 0 0
B_1 2 3
NODE B
Node B increment by 2

Update operations
■ On node A decrement by 5
■ Read the previous shard value
■ Generate the newest logical clock
■ Save new value and send to replica
Node
counter id
Shard’s
logical clock
Shard’s
value
A_1 1 -5
B_1 2 3
NODE A
Node
counter id
Shard’s
logical clock
Shard’s
value
A_1 1 -5
B_1 2 3
NODE B
Node A decrement by 5

Read operations
■ While reading nodes merge the value from each shards
Value = 3 + (-5) = -2
Node
counter id
Shard’s
logical clock
Shard’s
value
A_1 1 -5
B_1 2 3

Counter migration approach
20
21 Double write

20
21
22
23
24
1
2
3
Double write
Leverage a data

20
21
22
23
24
25
26
27
1
2
3
28
29
30
Double write
Leverage a data

.. and we’ve written our
own app

Counter-migrator
■ Java
• All ųs were written in that language
• Well known
■ Spread token ring
• 6144 ranges
• select * from table where token(a) >= ? and token(a) < ?
■ Compare and set

Out of memory
■ Extend ranges
■ 68B / 6144 ~= 11M
■ Spread into 600.000 ranges ~= 100k

Out of memory
■ Extend ranges
■ 68B / 6144 ~= 11M
■ Spread into 600.000 ranges ~= 100k
■ And shuffle that ranges - remember about Shards!

Java and Spring…
■ 30sek Spring Context start
■ JVM Heap
■ Bunch of machines

Java and Spring…
■ 30sek Spring Context start
■ JVM Heap
■ Bunch of machines
■ Switch to GOLANG

Missing alerting and swap
■ 1 node goes down (w/o swap and another process)
■ Alerts not set properly
■ Hints aggressive workload

Tune driver and CL
■ Default is always wrong
■ Properly CL even ALL
■ Scylla Driver

Avoid network latency
■ Use batches
■ Increase a warning threshold not to overload journal

Result
2x n2-standard-8
5 days

Metrics
■ API related to DB:
• 99perc 80ms to 20ms
• 90perc 50ms to 15ms

Disk space
■ Cassandra 2.1
• 48 TB
• 45% occupation
■ Scylla 4.4
• 18 TB
• 55% occupation
0 TB
5.5 TB
11 TB
16.5 TB
22 TB
Cassandra Scylla

We’ve checked if all data
exists :)

Disk space
■ MD-format
■ Aggressive compaction
■ Zstd compression
0 TB
5.5 TB
11 TB
16.5 TB
22 TB
Cassandra Scylla

Final result
■ 48 C* => 6 Scylla
■ Improve GCS cost by Incremental Snapshots
■ N2 and LocalSSD - Commitment
■ TCO REDUCED 75%

Lesson learned
■ Before migration
• Cleanup and Repair cluster
• Sometimes even compact
■ Remember about tables properties
■ Adjust scylla.yaml
• Hints window time
• max_partition_key_restrictions_per_query (or better improve ųs code)
• internode_compression
• batch_size_warn/fail
■ Improve and keep changes in IaC like Ansible playbooks

Thank you!
Stay in touch
Robert Czupioł
linkedin.com/in/robert-czupioł-2a34b394
robert.czupiol@gmail.com

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

More Related Content

What's hot

Similar to Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records

More from ScyllaDB

Recently uploaded

Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records