SlideShare a Scribd company logo
How did we migrate data for millions
of live users from MySQL to
Cassandra
Andrey Panasyuk, @defascat
Plan
Use case
Challenges
a. Individual
b. Corporate
Servers
● Thousands of servers in prod
● Java 8
● Tomcat 7
● Spring 3
● Hibernate 3
Sharded MySQL. Current state
Sharded MySQL. Environment
1. MySQL (Percona Server)
2. Hardware configuration:
a. two Intel E2620v2 CPU
b. 128GB RAM
c. 12x800GB Intel SSD, RAID 10
d. two 2Gb network interfaces (bonded)
MemcacheD
1. Hibernate
a. Query Cache
b. Entity Cache
2. 100th of nodes
3. ~100MBps per Memcache node
Sharded MySQL. Failover
1. master
2. co-master
3. flogger
4. archive
X4
Sharded MySQL. Approach
1. Hibernate changes:
a. Patching 2nd level caching:
i. +environment
ii. -class version
b. More info to debug problems
c. Fixing bugs
2. Own implementation:
a. FitbitTransactional
b. ManagedHibernateSession
3. Dynamic sharding concept (somewhat similar to C*)
Sharded MySQL. Data migration
Solution: vBucket
Sharded MySQL. Data migration
Migration (96 -> 152 shards):
● vBuckets to move: 96579
● 1 bucket migration time: 8 min
● 10 bucketmover * 3 processes - 12 days
Sharded MySQL. Data migration
Job
● Setup
a. Ensures vbuckets in read-only mode
b. Waits for servers to reach consensus
● Execute
a. Triggers actions (dump, insert, etc.) on Bucketmover
b. Waits for actions to complete
● Wrap-up
a. Updates shards for vbuckets, re-opens them for writes
b. Advances jobs to next action
Sharded MySQL. Schema migration
1. Locks during schema update
Solution: pt-online-schema-change + protobuf
Drawbacks:
1. Split between DML/DDL scripts
2. Binary format (additional data)
3. Additional platform specific tool
message Meta {
optional string name = 1;
optional string intro = 2;
...
repeated string requiredFeatures = 32;
}
message Challenge {
optional Meta meta = 1;
...
optional CWRace cw_race = 6;
}
Sharded MySQL. Development
1. Job system across shards
2. Use unsharded databases for lookup tables
3. Do not forget about custom annotation
@PrimaryEntity(entityType = EntityType.SHARDED_ENTITY)
Query patterns
1. Create challenge
2. List challenges by user
3. Get challenge team leaderboard by user
4. Post a message
5. List challenge messages
6. Cheer a message
MySQL. Not a problem
1. Knowledge Base
2. Response Time
Our problems
1. MySQL
a. Scalability
b. Fault tolerance
c. Schema migration
d. Locks
2. Infrastructure cost
a. MemcacheD
b. Redis
C* expectations
1. Scalability
2. Quicker fault recovery
3. Easier schema migration
4. Lack of locks
Migration specifics
1. Millions of real users in prod
2. No downtime
Apache Cassandra
Apache Cassandra is a free and open-source distributed database
management system designed to handle large amounts of data across
many commodity servers, providing high availability with no single
point of failure. Cassandra offers robust support for clusters spanning
multiple datacenters, with asynchronous masterless replication allowing
low latency operations for all clients.
Setting cluster up
1. Performance testing
2. Monitoring
3. Alerting
4. Incremental repairs (CASSANDRA-9935)
C* tweaks
1. ParNew+CMS -> G1
2. MaxPauseGCMillis = 200ms
3. ParallelGCThreads and ConcGCThreads = 4
4. Compaction
5. gc_grace_seconds = 0 (already big TTL for our data)
Create keyspaces/tables
1. Almost the same schema with Cassandra adjustments
2. Data denormalization was required in several places
ID migration
1. Create pseudo-random migration UUID based on BIGINT
2. Thank API designers for using string as object ids.
3. Make sure clients are ready for the new length of the id.
4. Migrate API to UUID all over the place
DAO (Data Access Object)
1. Create CassandraDAO with the same interface as HibernateDAO
2. Create ProxyAdapterDAO to control which implementation to select
3. Create adapter implementation for each DAO with the same
interface as HibernateDAO
Enable shadow writes (percentage)
1. Introduce environment specific settings for shadow writes
2. Adjust ProxyAdapterDAO code to enable shadow writes by
percentage. Various implementations.
3. Analyze performance (StatsD metrics for our code + Cassandra
metrics)
Migrate legacy data
1. Create a new job to read/migrate data
2. Process data in batches
Start shadow C* reads with validation
1. Environment specific settings for data validation
2. Adjust ProxyAdapterDAO code to enable simultaneous read from
MySQL and Cassandra
3. Adjust ProxyAdapterDAO to be able to compare objects
4. Logging & investigating data discrepancy.
Check validation issues
1. Path
a. Fix code problems
b. Migrate affected challenges again
c. Go to step 1
2. Duration: 1.5 month
Turn on read from C*
1. Introduce C* return read percentage in the config settings
2. Still do shadow MySQL reads and validations
3. Increase percentage over time
Turn off writes to MySQL
Clean-up
1. Adjust places which are not suitable for C* patterns like look
through all of the shards.
2. Adjust adapters to get rid of Hibernate DAOs. Adapter hierarchy is
still presented
3. Remove obsolete code
4. Clean up MySQL database
Challenge Events Migration Example
1. Previous attempts:
a. SWF + SQS
b. MySQL + Job across all shards
2. Now
a. Complication due to C* as a queue performance
b. 16 threads across 1024 buckets
Code Redesign. Message cheer example
1. Originally
a. Read
b. Update BLOB
c. Persist
2. Approach
a. Update C* set as a single operation
Code Redesign. Life without transactions
1. BATCH
2. Some object as a single source of truth
Challenges C*. Current State
1. Two datacenters
2. 18 nodes
3. Hardware
a. 24-core CPU
b. 64 GB RAM
4. RF: 3
Results of migration
1. Significant improvement in persistence storage scalability &
management (comparing to MySQL RDBMS)
2. Minimizing number of external points of failures
3. Squashing Technical Debt
4. Created a reusable migration module
Cassandra Inconveniences
1. Lack of ACID transactions
2. MultiDC scenarios require concious decisions for
QUORUM/LOCAL_QUORUM.
3. Data denormalization
4. CQL vs SQL limitations
5. Less readable IDs
Surprisingly not a big deal
1. Lack of JOINs due to the model
2. Lack of aggregation functions due to the model (we’re on 2.1 now)
3. Eventual consistency
4. IDs format change
Migration from MySQL to Cassandra for millions of active users

More Related Content

What's hot

Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014
DataStax Academy
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr
 
Apache cassandra v4.0
Apache cassandra v4.0Apache cassandra v4.0
Apache cassandra v4.0
Yuki Morishita
 
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDB
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDBPowering Microservices with Docker, Kubernetes, Kafka, & MongoDB
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDB
MongoDB
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
Codership Oy - Creators of Galera Cluster
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning Speed
ScyllaDB
 
DevOps throughout time
DevOps throughout timeDevOps throughout time
DevOps throughout time
Hany Fahim
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
QAware GmbH
 
Cassandra Redis
Cassandra RedisCassandra Redis
Cassandra Redis
Diego Pacheco
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE Search
Caleb Rackliffe
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFS
DataStax Academy
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
Hao Chen
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
DataStax Academy
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
DataStax
 
Instaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandraInstaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandra
Instaclustr
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
Gordon Chung
 
MySQL replication & cluster
MySQL replication & clusterMySQL replication & cluster
MySQL replication & clusterelliando dias
 

What's hot (20)

Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014Cassandra Summit 2014: Cassandra at Instagram 2014
Cassandra Summit 2014: Cassandra at Instagram 2014
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Apache cassandra v4.0
Apache cassandra v4.0Apache cassandra v4.0
Apache cassandra v4.0
 
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDB
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDBPowering Microservices with Docker, Kubernetes, Kafka, & MongoDB
Powering Microservices with Docker, Kubernetes, Kafka, & MongoDB
 
Using galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wanUsing galera replication to create geo distributed clusters on the wan
Using galera replication to create geo distributed clusters on the wan
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
 
Lightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning SpeedLightweight Transactions at Lightning Speed
Lightweight Transactions at Lightning Speed
 
DevOps throughout time
DevOps throughout timeDevOps throughout time
DevOps throughout time
 
Microservices with Micronaut
Microservices with MicronautMicroservices with Micronaut
Microservices with Micronaut
 
Cassandra Redis
Cassandra RedisCassandra Redis
Cassandra Redis
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE Search
 
AddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFSAddThis: Scaling Cassandra up and down into containers with ZFS
AddThis: Scaling Cassandra up and down into containers with ZFS
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
How Prometheus Store the Data
How Prometheus Store the DataHow Prometheus Store the Data
How Prometheus Store the Data
 
PagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra FailuresPagerDuty: One Year of Cassandra Failures
PagerDuty: One Year of Cassandra Failures
 
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
Operations, Consistency, Failover for Multi-DC Clusters (Alexander Dejanovski...
 
Instaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandraInstaclustr introduction to managing cassandra
Instaclustr introduction to managing cassandra
 
Anatomy of an action
Anatomy of an actionAnatomy of an action
Anatomy of an action
 
MySQL replication & cluster
MySQL replication & clusterMySQL replication & cluster
MySQL replication & cluster
 

Viewers also liked

C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
ebenhewitt
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
Dave Gardner
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
Expand Your Communication Skills within Microsoft Project 2013
Expand Your Communication Skills within Microsoft Project 2013Expand Your Communication Skills within Microsoft Project 2013
Expand Your Communication Skills within Microsoft Project 2013
International Institute for Learning
 
Innovación tecnologica
Innovación tecnologicaInnovación tecnologica
Innovación tecnologica
rosanaelenae
 
certificate-CCNA Route and Switch
certificate-CCNA Route and Switchcertificate-CCNA Route and Switch
certificate-CCNA Route and SwitchLuis Matamoros
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Evan Chan
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
datastaxjp
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
Raj Singh
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
Juan Pedro Moreno
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
 
10 Ways To Get Clients for IT Software Development Companies
10 Ways To Get Clients for IT Software Development Companies10 Ways To Get Clients for IT Software Development Companies
10 Ways To Get Clients for IT Software Development Companies
Kraftblick
 
Cassandra Basics: Indexing
Cassandra Basics: IndexingCassandra Basics: Indexing
Cassandra Basics: Indexing
Benjamin Black
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
Jon Haddad
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLRyu Kobayashi
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
Jon Haddad
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
Miklos Christine
 

Viewers also liked (19)

C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Expand Your Communication Skills within Microsoft Project 2013
Expand Your Communication Skills within Microsoft Project 2013Expand Your Communication Skills within Microsoft Project 2013
Expand Your Communication Skills within Microsoft Project 2013
 
Innovación tecnologica
Innovación tecnologicaInnovación tecnologica
Innovación tecnologica
 
certificate-CCNA Route and Switch
certificate-CCNA Route and Switchcertificate-CCNA Route and Switch
certificate-CCNA Route and Switch
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyterdata science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
10 Ways To Get Clients for IT Software Development Companies
10 Ways To Get Clients for IT Software Development Companies10 Ways To Get Clients for IT Software Development Companies
10 Ways To Get Clients for IT Software Development Companies
 
Cassandra Basics: Indexing
Cassandra Basics: IndexingCassandra Basics: Indexing
Cassandra Basics: Indexing
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
 
Developers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQLDevelopers summit cassandraで見るNoSQL
Developers summit cassandraで見るNoSQL
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in PythonThe Nitty Gritty of Advanced Analytics Using Apache Spark in Python
The Nitty Gritty of Advanced Analytics Using Apache Spark in Python
 

Similar to Migration from MySQL to Cassandra for millions of active users

WSA: Scaling Web Service to Handle Millions of Requests per Second
WSA: Scaling Web Service to Handle Millions of Requests per SecondWSA: Scaling Web Service to Handle Millions of Requests per Second
WSA: Scaling Web Service to Handle Millions of Requests per Second
WebStackAcademy
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
Monal Daxini
 
Architecture Best Practices
Architecture Best PracticesArchitecture Best Practices
Architecture Best Practices
AWS Germany
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
Hiromitsu Komatsu
 
How to Write Great Kafka Connectors
How to Write Great Kafka ConnectorsHow to Write Great Kafka Connectors
How to Write Great Kafka Connectors
confluent
 
Midwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL FeaturesMidwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL Features
Dave Stokes
 
High performance java ee with j cache and cdi
High performance java ee with j cache and cdiHigh performance java ee with j cache and cdi
High performance java ee with j cache and cdi
Payara
 
Deploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia NetworksDeploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia Networks
MariaDB plc
 
Running database infrastructure on containers
Running database infrastructure on containersRunning database infrastructure on containers
Running database infrastructure on containers
MariaDB plc
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
DataStax Academy
 
Case Study with Answers.com on Scaling with Memcached and MySQL
Case Study with Answers.com on Scaling with Memcached and MySQLCase Study with Answers.com on Scaling with Memcached and MySQL
Case Study with Answers.com on Scaling with Memcached and MySQL
answers
 
Chotot k8s experiences.pptx
Chotot k8s experiences.pptxChotot k8s experiences.pptx
Chotot k8s experiences.pptx
arptit
 
Perf test Eng interview preparation
Perf test Eng interview preparationPerf test Eng interview preparation
Perf test Eng interview preparation
pratik mohite
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
Christian Johannsen
 
AppFabric Velocity
AppFabric VelocityAppFabric Velocity
AppFabric Velocity
Dennis van der Stelt
 
Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...
Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...
Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...Amazon Web Services
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
DataStax
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
confluent
 

Similar to Migration from MySQL to Cassandra for millions of active users (20)

WSA: Scaling Web Service to Handle Millions of Requests per Second
WSA: Scaling Web Service to Handle Millions of Requests per SecondWSA: Scaling Web Service to Handle Millions of Requests per Second
WSA: Scaling Web Service to Handle Millions of Requests per Second
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
Architecture Best Practices
Architecture Best PracticesArchitecture Best Practices
Architecture Best Practices
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
How to Write Great Kafka Connectors
How to Write Great Kafka ConnectorsHow to Write Great Kafka Connectors
How to Write Great Kafka Connectors
 
Midwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL FeaturesMidwest PHP Presentation - New MSQL Features
Midwest PHP Presentation - New MSQL Features
 
High performance java ee with j cache and cdi
High performance java ee with j cache and cdiHigh performance java ee with j cache and cdi
High performance java ee with j cache and cdi
 
Deploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia NetworksDeploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia Networks
 
Running database infrastructure on containers
Running database infrastructure on containersRunning database infrastructure on containers
Running database infrastructure on containers
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
Case Study with Answers.com on Scaling with Memcached and MySQL
Case Study with Answers.com on Scaling with Memcached and MySQLCase Study with Answers.com on Scaling with Memcached and MySQL
Case Study with Answers.com on Scaling with Memcached and MySQL
 
Chotot k8s experiences.pptx
Chotot k8s experiences.pptxChotot k8s experiences.pptx
Chotot k8s experiences.pptx
 
Perf test Eng interview preparation
Perf test Eng interview preparationPerf test Eng interview preparation
Perf test Eng interview preparation
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
AppFabric Velocity
AppFabric VelocityAppFabric Velocity
AppFabric Velocity
 
Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...
Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...
Leveraging Amazon Web Services for Scalable Media Distribution and Analytics ...
 
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd KnownCassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
Cassandra Community Webinar: MySQL to Cassandra - What I Wish I'd Known
 
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
Riddles of Streaming - Code Puzzlers for Fun & Profit (Nick Dearden, Confluen...
 
OpenDS_Jazoon2010
OpenDS_Jazoon2010OpenDS_Jazoon2010
OpenDS_Jazoon2010
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

Migration from MySQL to Cassandra for millions of active users

  • 1. How did we migrate data for millions of live users from MySQL to Cassandra Andrey Panasyuk, @defascat
  • 4. Servers ● Thousands of servers in prod ● Java 8 ● Tomcat 7 ● Spring 3 ● Hibernate 3
  • 5.
  • 7. Sharded MySQL. Environment 1. MySQL (Percona Server) 2. Hardware configuration: a. two Intel E2620v2 CPU b. 128GB RAM c. 12x800GB Intel SSD, RAID 10 d. two 2Gb network interfaces (bonded)
  • 8. MemcacheD 1. Hibernate a. Query Cache b. Entity Cache 2. 100th of nodes 3. ~100MBps per Memcache node
  • 9. Sharded MySQL. Failover 1. master 2. co-master 3. flogger 4. archive X4
  • 10. Sharded MySQL. Approach 1. Hibernate changes: a. Patching 2nd level caching: i. +environment ii. -class version b. More info to debug problems c. Fixing bugs 2. Own implementation: a. FitbitTransactional b. ManagedHibernateSession 3. Dynamic sharding concept (somewhat similar to C*)
  • 11. Sharded MySQL. Data migration Solution: vBucket
  • 12. Sharded MySQL. Data migration Migration (96 -> 152 shards): ● vBuckets to move: 96579 ● 1 bucket migration time: 8 min ● 10 bucketmover * 3 processes - 12 days
  • 13. Sharded MySQL. Data migration Job ● Setup a. Ensures vbuckets in read-only mode b. Waits for servers to reach consensus ● Execute a. Triggers actions (dump, insert, etc.) on Bucketmover b. Waits for actions to complete ● Wrap-up a. Updates shards for vbuckets, re-opens them for writes b. Advances jobs to next action
  • 14. Sharded MySQL. Schema migration 1. Locks during schema update Solution: pt-online-schema-change + protobuf Drawbacks: 1. Split between DML/DDL scripts 2. Binary format (additional data) 3. Additional platform specific tool message Meta { optional string name = 1; optional string intro = 2; ... repeated string requiredFeatures = 32; } message Challenge { optional Meta meta = 1; ... optional CWRace cw_race = 6; }
  • 15. Sharded MySQL. Development 1. Job system across shards 2. Use unsharded databases for lookup tables 3. Do not forget about custom annotation @PrimaryEntity(entityType = EntityType.SHARDED_ENTITY)
  • 16. Query patterns 1. Create challenge 2. List challenges by user 3. Get challenge team leaderboard by user 4. Post a message 5. List challenge messages 6. Cheer a message
  • 17. MySQL. Not a problem 1. Knowledge Base 2. Response Time
  • 18. Our problems 1. MySQL a. Scalability b. Fault tolerance c. Schema migration d. Locks 2. Infrastructure cost a. MemcacheD b. Redis
  • 19. C* expectations 1. Scalability 2. Quicker fault recovery 3. Easier schema migration 4. Lack of locks
  • 20. Migration specifics 1. Millions of real users in prod 2. No downtime
  • 21.
  • 22. Apache Cassandra Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
  • 23. Setting cluster up 1. Performance testing 2. Monitoring 3. Alerting 4. Incremental repairs (CASSANDRA-9935)
  • 24. C* tweaks 1. ParNew+CMS -> G1 2. MaxPauseGCMillis = 200ms 3. ParallelGCThreads and ConcGCThreads = 4 4. Compaction 5. gc_grace_seconds = 0 (already big TTL for our data)
  • 25. Create keyspaces/tables 1. Almost the same schema with Cassandra adjustments 2. Data denormalization was required in several places
  • 26. ID migration 1. Create pseudo-random migration UUID based on BIGINT 2. Thank API designers for using string as object ids. 3. Make sure clients are ready for the new length of the id. 4. Migrate API to UUID all over the place
  • 27. DAO (Data Access Object) 1. Create CassandraDAO with the same interface as HibernateDAO 2. Create ProxyAdapterDAO to control which implementation to select 3. Create adapter implementation for each DAO with the same interface as HibernateDAO
  • 28. Enable shadow writes (percentage) 1. Introduce environment specific settings for shadow writes 2. Adjust ProxyAdapterDAO code to enable shadow writes by percentage. Various implementations. 3. Analyze performance (StatsD metrics for our code + Cassandra metrics)
  • 29. Migrate legacy data 1. Create a new job to read/migrate data 2. Process data in batches
  • 30. Start shadow C* reads with validation 1. Environment specific settings for data validation 2. Adjust ProxyAdapterDAO code to enable simultaneous read from MySQL and Cassandra 3. Adjust ProxyAdapterDAO to be able to compare objects 4. Logging & investigating data discrepancy.
  • 31. Check validation issues 1. Path a. Fix code problems b. Migrate affected challenges again c. Go to step 1 2. Duration: 1.5 month
  • 32. Turn on read from C* 1. Introduce C* return read percentage in the config settings 2. Still do shadow MySQL reads and validations 3. Increase percentage over time
  • 33. Turn off writes to MySQL
  • 34. Clean-up 1. Adjust places which are not suitable for C* patterns like look through all of the shards. 2. Adjust adapters to get rid of Hibernate DAOs. Adapter hierarchy is still presented 3. Remove obsolete code 4. Clean up MySQL database
  • 35. Challenge Events Migration Example 1. Previous attempts: a. SWF + SQS b. MySQL + Job across all shards 2. Now a. Complication due to C* as a queue performance b. 16 threads across 1024 buckets
  • 36. Code Redesign. Message cheer example 1. Originally a. Read b. Update BLOB c. Persist 2. Approach a. Update C* set as a single operation
  • 37. Code Redesign. Life without transactions 1. BATCH 2. Some object as a single source of truth
  • 38.
  • 39. Challenges C*. Current State 1. Two datacenters 2. 18 nodes 3. Hardware a. 24-core CPU b. 64 GB RAM 4. RF: 3
  • 40. Results of migration 1. Significant improvement in persistence storage scalability & management (comparing to MySQL RDBMS) 2. Minimizing number of external points of failures 3. Squashing Technical Debt 4. Created a reusable migration module
  • 41. Cassandra Inconveniences 1. Lack of ACID transactions 2. MultiDC scenarios require concious decisions for QUORUM/LOCAL_QUORUM. 3. Data denormalization 4. CQL vs SQL limitations 5. Less readable IDs
  • 42. Surprisingly not a big deal 1. Lack of JOINs due to the model 2. Lack of aggregation functions due to the model (we’re on 2.1 now) 3. Eventual consistency 4. IDs format change