In this talk, Felipe Mendes, Solutions Architect at ScyllaDB, shares how 4 companies managed their migration. He covers:
Disney+ – No migration needed!
Discord – Shadow cluster
OpenWeb – TTL expiration, cover Load and Stream
MyHeritage – Counters
ShareChat – Bonus: A bit of everything
5. What's good
Migrations don't have to be a burden when you understand the basics
■ Online migration
○ Added complexity
○ Added load
■ Offline migration
○ Downtime
■ Common steps
○ Schema migration and adjustments
○ Existing data migration
○ Data validation
5
Client
DB
DB
6. Migration Anatomy
6
Write to DB-OLD
Time
Read from DB-OLD
Migrate Schema
Forklifting Existing Data
Replay Changes
to DB-NEW
Capture Changes
from DB-OLD
DBs in Sync
Validation*
Fade off DB-OLD
Read from DB-NEW
Write to DB-NEW
Consume from Kafka, AWS
Lambda, Spark, etc
Dynamo Streams,
CDC, Dual-writes…
7. What's bad
Just the basics are often not enough
■ Technology switch
○ Application re-work
○ Data re-modeling
○ Tests
■ Tooling
○ Lack of
○ Deal with incompatibilities
○ Cook your own
■ Edge cases
○ Special data-types (eg: Counters)
○ Serializability
○ Lack of similar functionality
○ Preserve attributes (eg: TTL, TIMESTAMP)
7
Client
DB
DB
8. Considerations
Be REASONABLE
■ Can afford some data loss?
○ IOT measurements
○ Logs and Traces
■ Does it make sense to forklift EVERYTHING?
○ Split the work in smaller steps
○ Retention periods
■ Plan Ahead
○ Identify pain points and improvements early
○ Save your own time
○ TEST
8
10. Streaming: Bulk Load
DynamoDB → ScyllaDB
■ The good
○ No Migration!
○ Simplified data modeling
■ The bad
○ Out of Order writes?
○ Implement record versioning?
○ Compression?
■ Considerations
○ Differences between your source and target databases
○ Identify room for improvement
10
Client
Client Downstream
feed
11. 11
Dual-writes and Out of Order Writes
Client
Writes
Writes
Reads
Writes
Which one wins?
12. Problem: Out of Order Writes
12
■ The CQL protocol allows one to manipulate the timestamp of writes
■ The need is often tied to always persist the last record, thus avoid
overwriting the latest (valid) records with an older one.
■ Improvement:
● No need for LWT
● No need for read-before-write
13. Engagement Platform: TTL'd data
ScyllaDB → ScyllaDB
■ The good
○ Data modeling good to go!
○ Lift and Shift
■ The bad
○ Window for data loss
○ Manual process
○ No turning back
■ Considerations
○ Test and time each stop thoroughly
○ Potentially repeat the process after initial switch
13
Client
Forklift
snapshot
14. 14
TTL Data: How it is typically done
Client
Writes
Reads / Writes
TTL Expires
Reads
15. 15
Load and Stream: How we did it
Client
Transfer snapshots
snapshot
snapshot
snapshot
snapshot
snapshot
snapshot snapshot
nodetool refresh --load-and-stream
16. Messaging App: Shadow Cluster
Cassandra → ScyllaDB
■ The good
○ Same protocol
○ Zero data loss / User impact
■ The bad
○ Expensive ($$ and time)
○ Increased app complexity
■ Considerations
○ Throttle and balance source system traffic
○ Test under different scenarios
16
Client
CQL
CQL
17. 17
CQL – TTL & WRITETIME Quirks
■ Complex data-types are challenging
● UDTs, collections (frozen & non-frozen)
■ Heavy use of collections could introduce performance overhead
■ USING TIMESTAMP: Manipulate timestamps (WRITETIME)
■ USING TTL: Manipulate TTL
18. 18
CQL to CQL – Under the hood
■ Data is distributed in a Token Ring
● Scan through: SELECT * FROM table WHERE token >= ? AND token < ?
● Save progress: With dual-writing there's no need to scan a token range again
■ If source has complex types:
● Hardcode TARGET timestamp to (time_before_migration_starts + grace_period)
■ Parallelize your work: Move faster, but careful with your source
Token Ring
Migration Parallelism
19. Genealogy Platform: Counter Tables
Cassandra → ScyllaDB
■ The good
○ Nothing, really
■ The bad
○ Counters 🥲
○ Extremely complex
■ Considerations
○ Understand your data types
○ Consider:
■ Introducing a small (seconds) offline window OR;
■ Split updates to another source of truth
19
Client
CQL
CQL
Forklift
snapshot
20. 20
The problem with Counters
■ A Conflict-free Replicated Data Type
● Concurrent updates converge to a stable value
■ Supports increment and decrement
● Can NOT be set to a given value!
■ Cassandra < 2.1 counters are dangerous:
● See: https://github.com/scylladb/scylladb/issues/4219
22. 22
Counters: How we did it
Client
Transfer snapshots to
Final (empty) table
snapshot
snapshot
snapshot
snapshot
snapshot
snapshot snapshot
Write to Delta table
W
r
i
t
e
t
o
F
i
n
a
l
(
r
e
s
t
o
r
e
d
)
t
a
b
l
e
Sideload (inc/dec)
Delta to Final