SHIFT.com
Migrating from MongoDB to Cassandra
by: Blake Eggleston & Jon Haddad
What is SHIFT.com?
Shift is a platform that enables marketers to
communicate across organizations and
departments in one s...
Initial Stack
● Python
○ Flask
○ Celery

● MongoDB
○ mongoengine

● Neo4j / Titan
○ Bulbs
○ thunderdome

● Redis
● AWS
○ m...
Current Stack
● Python
○ still flask
○ still celery
○ gevent (it rocks)

● Cassandra
○ 1.2.6
○ cqlengine

● ElasticSearch
...
Why did we move to Cassandra?
● Operational Benefits
○ Adding and removing nodes is much easier,
compared to Mongo’s shard...
Migration Goals
● Zero downtime
○ We wanted to roll out Cassandra without any
service interruptions

● No loss of performa...
Migration Strategy
Benefits of CQL3
● Easy to understand if you’re coming from
RDBMS
● Collections
○ sets, lists, maps

● Batch Queries
● Clu...
Physical vs Logical Row
Single Row
Clustered Row
Data Modelling Patterns
● considerations: working with Mongo’s dbrefs
and optimizing layout on disk
● structured tables as...
Time Series: Message Stream
● Users have tens of thousands of messages
● Each users message stream is specific to
them, li...
cqlengine
●
●
●
●
●

cqlengine.org
the Python CQL3 object-row mapper
exposes CQL3 tables as Python classes
maps columns to...
Improvements from moving to C*
●
●
●
●

Operationally we’ve had zero problems
Outstanding Performance
Easy to build new fe...
misc tips
● leveled compaction - good for read heavy
workloads
● use secondary indexes sparingly,
understand how they work...
Contact Info
Jon Haddad
@rustyrazorblade
jon@shift.com
Blake Eggleston
@blakeeggleston
blake@shift.com
….we’re hiring!
Upcoming SlideShare
Loading in...5
×

Cassandra meetup slides - Oct 15 Santa Monica Coloft

803

Published on

Slides from our presentation at the Santa Monica Coloft on our Migration from MongoDB to Cassandra.

Published in: Technology, Business
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
803
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Cassandra meetup slides - Oct 15 Santa Monica Coloft

  1. 1. SHIFT.com Migrating from MongoDB to Cassandra by: Blake Eggleston & Jon Haddad
  2. 2. What is SHIFT.com? Shift is a platform that enables marketers to communicate across organizations and departments in one single place. It’s also an open application platform with a set of applications built on top of it that can communicate with one another.
  3. 3. Initial Stack ● Python ○ Flask ○ Celery ● MongoDB ○ mongoengine ● Neo4j / Titan ○ Bulbs ○ thunderdome ● Redis ● AWS ○ m1.xlarge for mongo
  4. 4. Current Stack ● Python ○ still flask ○ still celery ○ gevent (it rocks) ● Cassandra ○ 1.2.6 ○ cqlengine ● ElasticSearch ● Redis ○ jondis ● AWS ○ m1.xlarge
  5. 5. Why did we move to Cassandra? ● Operational Benefits ○ Adding and removing nodes is much easier, compared to Mongo’s shards ● Control over our Data on Disk (LSMT) ● Love CQL3 ● Long term scalability ○ Scales Linearly ○ Multi DC Support Baked in
  6. 6. Migration Goals ● Zero downtime ○ We wanted to roll out Cassandra without any service interruptions ● No loss of performance ○ By carefully structuring our schema we were able to match MongoDB’s performance.
  7. 7. Migration Strategy
  8. 8. Benefits of CQL3 ● Easy to understand if you’re coming from RDBMS ● Collections ○ sets, lists, maps ● Batch Queries ● Clustering Keys ○ Handles ordering of logical rows ○ Saved us from column name management scheme and allowed us to focus on our data
  9. 9. Physical vs Logical Row
  10. 10. Single Row
  11. 11. Clustered Row
  12. 12. Data Modelling Patterns ● considerations: working with Mongo’s dbrefs and optimizing layout on disk ● structured tables as materialized views of the queries we planned on using ● moving multiple documents into a single physical row ● creating supporting index tables for looking up logical rows
  13. 13. Time Series: Message Stream ● Users have tens of thousands of messages ● Each users message stream is specific to them, like a twitter feed ● This is Cassandra’s strength - Time Series ● Considered Redis - but poor for multi-dc create table news_feed ( user_id uuid, message_id timeuuid, message, primary key (user_id, message_id));
  14. 14. cqlengine ● ● ● ● ● cqlengine.org the Python CQL3 object-row mapper exposes CQL3 tables as Python classes maps columns to properties builds CQL queries #model definition class ExampleModel(Model): example_id = columns.UUID (primary_key=True) example_type = columns.Integer(index=True) created_at = columns.DateTime() description = columns.Text(required=False) # example query ExampleModel.objects(example_type=1)
  15. 15. Improvements from moving to C* ● ● ● ● Operationally we’ve had zero problems Outstanding Performance Easy to build new features Community has been amazing (mailing list and #cassandra)
  16. 16. misc tips ● leveled compaction - good for read heavy workloads ● use secondary indexes sparingly, understand how they work and when to use them ● to reiterate, think about how you’re going to query your data ● use elastic search / solr for ad hoc queries
  17. 17. Contact Info Jon Haddad @rustyrazorblade jon@shift.com Blake Eggleston @blakeeggleston blake@shift.com ….we’re hiring!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×