SlideShare a Scribd company logo
1 of 25
Download to read offline
SESSION 2:
The Anatomy of a Migration
Felipe Cardeneti Mendes
Rustyrazorblade Consulting
2
Felipe Cardeneti Mendes
● Solution Architect at ScyllaDB
● Published Author
● Linux and Open Source enthusiast
3
Agenda
● The Anatomy of a Migration
○ What's good & what's bad
○ Considerations
● Beyond Theory: Practical Strategies
○ Bulk load
○ TTL expiration & Load'n'Stream
○ Shadow writes
○ Counters
○ Large scale migrations
The Anatomy of a
Migration
4
What's good
Migrations don't have to be a burden when you understand the basics
■ Online migration
○ Added complexity
○ Added load
■ Offline migration
○ Downtime
■ Common steps
○ Schema migration and adjustments
○ Existing data migration
○ Data validation
5
Client
DB
DB
Migration Anatomy
6
Write to DB-OLD
Time
Read from DB-OLD
Migrate Schema
Forklifting Existing Data
Replay Changes
to DB-NEW
Capture Changes
from DB-OLD
DBs in Sync
Validation*
Fade off DB-OLD
Read from DB-NEW
Write to DB-NEW
Consume from Kafka, AWS
Lambda, Spark, etc
Dynamo Streams,
CDC, Dual-writes…
What's bad
Just the basics are often not enough
■ Technology switch
○ Application re-work
○ Data re-modeling
○ Tests
■ Tooling
○ Lack of
○ Deal with incompatibilities
○ Cook your own
■ Edge cases
○ Special data-types (eg: Counters)
○ Serializability
○ Lack of similar functionality
○ Preserve attributes (eg: TTL, TIMESTAMP)
7
Client
DB
DB
Considerations
Be REASONABLE
■ Can afford some data loss?
○ IOT measurements
○ Logs and Traces
■ Does it make sense to forklift EVERYTHING?
○ Split the work in smaller steps
○ Retention periods
■ Plan Ahead
○ Identify pain points and improvements early
○ Save your own time
○ TEST
8
Beyond Theory:
Practical Strategies
9
Streaming: Bulk Load
DynamoDB → ScyllaDB
■ The good
○ No Migration! 󰚗
○ Simplified data modeling
■ The bad
○ Out of Order writes?
○ Implement record versioning?
○ Compression?
■ Considerations
○ Differences between your source and target databases
○ Identify room for improvement
10
Client
Client Downstream
feed
11
Dual-writes and Out of Order Writes
Client
Writes
Writes
Reads
Writes
Which one wins?
Problem: Out of Order Writes
12
■ The CQL protocol allows one to manipulate the timestamp of writes
■ The need is often tied to always persist the last record, thus avoid
overwriting the latest (valid) records with an older one.
■ Improvement:
● No need for LWT
● No need for read-before-write
Engagement Platform: TTL'd data
ScyllaDB → ScyllaDB
■ The good
○ Data modeling good to go!
○ Lift and Shift
■ The bad
○ Window for data loss
○ Manual process
○ No turning back
■ Considerations
○ Test and time each stop thoroughly
○ Potentially repeat the process after initial switch
13
Client
Forklift
snapshot
14
TTL Data: How it is typically done
Client
Writes
Reads / Writes
TTL Expires
Reads
15
Load and Stream: How we did it
Client
Transfer snapshots
snapshot
snapshot
snapshot
snapshot
snapshot
snapshot snapshot
nodetool refresh --load-and-stream
Messaging App: Shadow Cluster
Cassandra → ScyllaDB
■ The good
○ Same protocol
○ Zero data loss / User impact
■ The bad
○ Expensive ($$ and time)
○ Increased app complexity
■ Considerations
○ Throttle and balance source system traffic
○ Test under different scenarios
16
Client
CQL
CQL
17
CQL – TTL & WRITETIME Quirks
■ Complex data-types are challenging
● UDTs, collections (frozen & non-frozen)
■ Heavy use of collections could introduce performance overhead
■ USING TIMESTAMP: Manipulate timestamps (WRITETIME)
■ USING TTL: Manipulate TTL
18
CQL to CQL – Under the hood
■ Data is distributed in a Token Ring
● Scan through: SELECT * FROM table WHERE token >= ? AND token < ?
● Save progress: With dual-writing there's no need to scan a token range again
■ If source has complex types:
● Hardcode TARGET timestamp to (time_before_migration_starts + grace_period)
■ Parallelize your work: Move faster, but careful with your source
Token Ring
Migration Parallelism
Genealogy Platform: Counter Tables
Cassandra → ScyllaDB
■ The good
○ Nothing, really
■ The bad
○ Counters 🥲
○ Extremely complex
■ Considerations
○ Understand your data types
○ Consider:
■ Introducing a small (seconds) offline window OR;
■ Split updates to another source of truth
19
Client
CQL
CQL
Forklift
snapshot
20
The problem with Counters
■ A Conflict-free Replicated Data Type
● Concurrent updates converge to a stable value
■ Supports increment and decrement
● Can NOT be set to a given value!
■ Cassandra < 2.1 counters are dangerous:
● See: https://github.com/scylladb/scylladb/issues/4219
21
A potential approach
https://www.scylladb.com/2022/09/01/happn-falling-in-love-with-scylladb/
22
Counters: How we did it
Client
Transfer snapshots to
Final (empty) table
snapshot
snapshot
snapshot
snapshot
snapshot
snapshot snapshot
Write to Delta table
W
r
i
t
e
t
o
F
i
n
a
l
(
r
e
s
t
o
r
e
d
)
t
a
b
l
e
Sideload (inc/dec)
Delta to Final
23
Counters: How we did it
ShareChat: (Large)² Scale Migration
24
https://www.scylladb.com/presentations/sharechats-journey-migrating-100tb-of-data-to-scylladb-with-no-downtime/
Keep in touch!
Felipe Mendes
Solution Architect
ScyllaDB
felipemendes@scylladb.com
LinkedIn

More Related Content

Similar to NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYEKnoldus Inc.
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems researchVasia Kalavri
 
99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's Guide99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's GuideDanny Al-Gaaf
 
Cassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache CassandraCassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache CassandraDataStax Academy
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish styleLars Albertsson
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingMartinStrycek
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBHow Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBScyllaDB
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixDatabricks
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdfLars Albertsson
 
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIBeyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIconfluent
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaHotstar
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's studentsMohamed Nadjib MAMI
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahujacamunda services GmbH
 

Similar to NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration (20)

Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Monitoring Cassandra With An EYE
Monitoring Cassandra With An EYEMonitoring Cassandra With An EYE
Monitoring Cassandra With An EYE
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Big data processing systems research
Big data processing systems researchBig data processing systems research
Big data processing systems research
 
99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's Guide99.999% Available OpenStack Cloud - A Builder's Guide
99.999% Available OpenStack Cloud - A Builder's Guide
 
Cassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache CassandraCassandra Day London 2015: The Resilience of Apache Cassandra
Cassandra Day London 2015: The Resilience of Apache Cassandra
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
Data ops in practice - Swedish style
Data ops in practice - Swedish styleData ops in practice - Swedish style
Data ops in practice - Swedish style
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDBHow Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
How Discord Migrated Trillions of Messages from Cassandra to ScyllaDB
 
Migrating to Apache Spark at Netflix
Migrating to Apache Spark at NetflixMigrating to Apache Spark at Netflix
Migrating to Apache Spark at Netflix
 
Data engineering in 10 years.pdf
Data engineering in 10 years.pdfData engineering in 10 years.pdf
Data engineering in 10 years.pdf
 
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor APIBeyond the DSL - Unlocking the power of Kafka Streams with the Processor API
Beyond the DSL - Unlocking the power of Kafka Streams with the Processor API
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
 
How to get started in Big Data for master's students
How to get started in Big Data for master's studentsHow to get started in Big Data for master's students
How to get started in Big Data for master's students
 
Multicore architectures
Multicore architecturesMulticore architectures
Multicore architectures
 
Server fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil AhujaServer fleet management using Camunda by Akhil Ahuja
Server fleet management using Camunda by Akhil Ahuja
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesScyllaDB
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling Mistakes
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration

  • 1. SESSION 2: The Anatomy of a Migration Felipe Cardeneti Mendes Rustyrazorblade Consulting
  • 2. 2 Felipe Cardeneti Mendes ● Solution Architect at ScyllaDB ● Published Author ● Linux and Open Source enthusiast
  • 3. 3 Agenda ● The Anatomy of a Migration ○ What's good & what's bad ○ Considerations ● Beyond Theory: Practical Strategies ○ Bulk load ○ TTL expiration & Load'n'Stream ○ Shadow writes ○ Counters ○ Large scale migrations
  • 4. The Anatomy of a Migration 4
  • 5. What's good Migrations don't have to be a burden when you understand the basics ■ Online migration ○ Added complexity ○ Added load ■ Offline migration ○ Downtime ■ Common steps ○ Schema migration and adjustments ○ Existing data migration ○ Data validation 5 Client DB DB
  • 6. Migration Anatomy 6 Write to DB-OLD Time Read from DB-OLD Migrate Schema Forklifting Existing Data Replay Changes to DB-NEW Capture Changes from DB-OLD DBs in Sync Validation* Fade off DB-OLD Read from DB-NEW Write to DB-NEW Consume from Kafka, AWS Lambda, Spark, etc Dynamo Streams, CDC, Dual-writes…
  • 7. What's bad Just the basics are often not enough ■ Technology switch ○ Application re-work ○ Data re-modeling ○ Tests ■ Tooling ○ Lack of ○ Deal with incompatibilities ○ Cook your own ■ Edge cases ○ Special data-types (eg: Counters) ○ Serializability ○ Lack of similar functionality ○ Preserve attributes (eg: TTL, TIMESTAMP) 7 Client DB DB
  • 8. Considerations Be REASONABLE ■ Can afford some data loss? ○ IOT measurements ○ Logs and Traces ■ Does it make sense to forklift EVERYTHING? ○ Split the work in smaller steps ○ Retention periods ■ Plan Ahead ○ Identify pain points and improvements early ○ Save your own time ○ TEST 8
  • 10. Streaming: Bulk Load DynamoDB → ScyllaDB ■ The good ○ No Migration! 󰚗 ○ Simplified data modeling ■ The bad ○ Out of Order writes? ○ Implement record versioning? ○ Compression? ■ Considerations ○ Differences between your source and target databases ○ Identify room for improvement 10 Client Client Downstream feed
  • 11. 11 Dual-writes and Out of Order Writes Client Writes Writes Reads Writes Which one wins?
  • 12. Problem: Out of Order Writes 12 ■ The CQL protocol allows one to manipulate the timestamp of writes ■ The need is often tied to always persist the last record, thus avoid overwriting the latest (valid) records with an older one. ■ Improvement: ● No need for LWT ● No need for read-before-write
  • 13. Engagement Platform: TTL'd data ScyllaDB → ScyllaDB ■ The good ○ Data modeling good to go! ○ Lift and Shift ■ The bad ○ Window for data loss ○ Manual process ○ No turning back ■ Considerations ○ Test and time each stop thoroughly ○ Potentially repeat the process after initial switch 13 Client Forklift snapshot
  • 14. 14 TTL Data: How it is typically done Client Writes Reads / Writes TTL Expires Reads
  • 15. 15 Load and Stream: How we did it Client Transfer snapshots snapshot snapshot snapshot snapshot snapshot snapshot snapshot nodetool refresh --load-and-stream
  • 16. Messaging App: Shadow Cluster Cassandra → ScyllaDB ■ The good ○ Same protocol ○ Zero data loss / User impact ■ The bad ○ Expensive ($$ and time) ○ Increased app complexity ■ Considerations ○ Throttle and balance source system traffic ○ Test under different scenarios 16 Client CQL CQL
  • 17. 17 CQL – TTL & WRITETIME Quirks ■ Complex data-types are challenging ● UDTs, collections (frozen & non-frozen) ■ Heavy use of collections could introduce performance overhead ■ USING TIMESTAMP: Manipulate timestamps (WRITETIME) ■ USING TTL: Manipulate TTL
  • 18. 18 CQL to CQL – Under the hood ■ Data is distributed in a Token Ring ● Scan through: SELECT * FROM table WHERE token >= ? AND token < ? ● Save progress: With dual-writing there's no need to scan a token range again ■ If source has complex types: ● Hardcode TARGET timestamp to (time_before_migration_starts + grace_period) ■ Parallelize your work: Move faster, but careful with your source Token Ring Migration Parallelism
  • 19. Genealogy Platform: Counter Tables Cassandra → ScyllaDB ■ The good ○ Nothing, really ■ The bad ○ Counters 🥲 ○ Extremely complex ■ Considerations ○ Understand your data types ○ Consider: ■ Introducing a small (seconds) offline window OR; ■ Split updates to another source of truth 19 Client CQL CQL Forklift snapshot
  • 20. 20 The problem with Counters ■ A Conflict-free Replicated Data Type ● Concurrent updates converge to a stable value ■ Supports increment and decrement ● Can NOT be set to a given value! ■ Cassandra < 2.1 counters are dangerous: ● See: https://github.com/scylladb/scylladb/issues/4219
  • 22. 22 Counters: How we did it Client Transfer snapshots to Final (empty) table snapshot snapshot snapshot snapshot snapshot snapshot snapshot Write to Delta table W r i t e t o F i n a l ( r e s t o r e d ) t a b l e Sideload (inc/dec) Delta to Final
  • 24. ShareChat: (Large)² Scale Migration 24 https://www.scylladb.com/presentations/sharechats-journey-migrating-100tb-of-data-to-scylladb-with-no-downtime/
  • 25. Keep in touch! Felipe Mendes Solution Architect ScyllaDB felipemendes@scylladb.com LinkedIn