SlideShare a Scribd company logo
SHIFT.com
Migrating from MongoDB to Cassandra
by: Blake Eggleston & Jon Haddad
What is SHIFT.com?
Shift is a platform that enables marketers to
communicate across organizations and
departments in one single place.
It’s also an open application platform with a
set of applications built on top of it that can
communicate with one another.
Initial Stack
● Python
○ Flask
○ Celery

● MongoDB
○ mongoengine

● Neo4j / Titan
○ Bulbs
○ thunderdome

● Redis
● AWS
○ m1.xlarge for mongo
Current Stack
● Python
○ still flask
○ still celery
○ gevent (it rocks)

● Cassandra
○ 1.2.6
○ cqlengine

● ElasticSearch
● Redis
○ jondis

● AWS
○ m1.xlarge
Why did we move to Cassandra?
● Operational Benefits
○ Adding and removing nodes is much easier,
compared to Mongo’s shards

● Control over our Data on Disk (LSMT)
● Love CQL3
● Long term scalability
○ Scales Linearly
○ Multi DC Support Baked in
Migration Goals
● Zero downtime
○ We wanted to roll out Cassandra without any
service interruptions

● No loss of performance
○ By carefully structuring our schema we were able
to match MongoDB’s performance.
Migration Strategy
Benefits of CQL3
● Easy to understand if you’re coming from
RDBMS
● Collections
○ sets, lists, maps

● Batch Queries
● Clustering Keys
○ Handles ordering of logical rows
○ Saved us from column name management scheme
and allowed us to focus on our data
Physical vs Logical Row
Single Row
Clustered Row
Data Modelling Patterns
● considerations: working with Mongo’s dbrefs
and optimizing layout on disk
● structured tables as materialized views of
the queries we planned on using
● moving multiple documents into a single
physical row
● creating supporting index tables for looking
up logical rows
Time Series: Message Stream
● Users have tens of thousands of messages
● Each users message stream is specific to
them, like a twitter feed
● This is Cassandra’s strength - Time Series
● Considered Redis - but poor for multi-dc
create table news_feed (
user_id uuid,
message_id timeuuid,
message,
primary key (user_id, message_id));
cqlengine
●
●
●
●
●

cqlengine.org
the Python CQL3 object-row mapper
exposes CQL3 tables as Python classes
maps columns to properties
builds CQL queries
#model definition
class ExampleModel(Model):
example_id
= columns.UUID
(primary_key=True)
example_type
= columns.Integer(index=True)
created_at
= columns.DateTime()
description
= columns.Text(required=False)
# example query
ExampleModel.objects(example_type=1)
Improvements from moving to C*
●
●
●
●

Operationally we’ve had zero problems
Outstanding Performance
Easy to build new features
Community has been amazing (mailing list
and #cassandra)
misc tips
● leveled compaction - good for read heavy
workloads
● use secondary indexes sparingly,
understand how they work and when to use
them
● to reiterate, think about how you’re going
to query your data
● use elastic search / solr for ad hoc queries
Contact Info
Jon Haddad
@rustyrazorblade
jon@shift.com
Blake Eggleston
@blakeeggleston
blake@shift.com
….we’re hiring!

More Related Content

What's hot

Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
Claudiu Coman
 
Entity Framework Core
Entity Framework CoreEntity Framework Core
Entity Framework Core
Kiran Shahi
 
SqliteToRealm
SqliteToRealmSqliteToRealm
SqliteToRealm
Pluu love
 
Comparing Orchestration
Comparing OrchestrationComparing Orchestration
Comparing Orchestration
Knoldus Inc.
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
Jonathan Symonds
 
Introduction to Bizur
Introduction to BizurIntroduction to Bizur
Introduction to Bizur
Akira Hayakawa
 
Groovy reactive streams
Groovy reactive streamsGroovy reactive streams
Groovy reactive streams
adam1davis
 
Dataspace presentatie
Dataspace presentatieDataspace presentatie
Dataspace presentatie
Roland Cornelissen
 
BigData in IoT #iotconfua
BigData in IoT #iotconfuaBigData in IoT #iotconfua
BigData in IoT #iotconfua
Andy Shutka
 
All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka
Florian Benz
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
Fraugster
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018
Orit Wasserman
 
Docker @haufe lexware tech lunch
Docker @haufe lexware tech lunchDocker @haufe lexware tech lunch
Docker @haufe lexware tech lunch
HaufeLexwareRomania
 
Open stack @ iiit hyderabad
Open stack @ iiit hyderabad Open stack @ iiit hyderabad
Open stack @ iiit hyderabad openstackindia
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Pradeeban Kathiravelu, Ph.D.
 
Введение в Apache Cassandra
Введение в Apache CassandraВведение в Apache Cassandra
Введение в Apache Cassandra
Open-IT
 
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
MongoDB
 
Intro to NoSQL and MongoDB
 Intro to NoSQL and MongoDB Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
MongoDB
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
Eric Evans
 

What's hot (20)

Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Entity Framework Core
Entity Framework CoreEntity Framework Core
Entity Framework Core
 
SqliteToRealm
SqliteToRealmSqliteToRealm
SqliteToRealm
 
Comparing Orchestration
Comparing OrchestrationComparing Orchestration
Comparing Orchestration
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 
Introduction to Bizur
Introduction to BizurIntroduction to Bizur
Introduction to Bizur
 
Groovy reactive streams
Groovy reactive streamsGroovy reactive streams
Groovy reactive streams
 
Dataspace presentatie
Dataspace presentatieDataspace presentatie
Dataspace presentatie
 
BigData in IoT #iotconfua
BigData in IoT #iotconfuaBigData in IoT #iotconfua
BigData in IoT #iotconfua
 
All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka All you need to know about Kotlin's documentation engine Dokka
All you need to know about Kotlin's documentation engine Dokka
 
Intro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana GoriucIntro To Graph Databases - Oxana Goriuc
Intro To Graph Databases - Oxana Goriuc
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018
 
Docker @haufe lexware tech lunch
Docker @haufe lexware tech lunchDocker @haufe lexware tech lunch
Docker @haufe lexware tech lunch
 
Migration
MigrationMigration
Migration
 
Open stack @ iiit hyderabad
Open stack @ iiit hyderabad Open stack @ iiit hyderabad
Open stack @ iiit hyderabad
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
 
Введение в Apache Cassandra
Введение в Apache CassandraВведение в Apache Cassandra
Введение в Apache Cassandra
 
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
MongoDB IoT City Tour STUTTGART: Managing the Database Complexity, by Arthur ...
 
Intro to NoSQL and MongoDB
 Intro to NoSQL and MongoDB Intro to NoSQL and MongoDB
Intro to NoSQL and MongoDB
 
Wikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-caseWikimedia Content API: A Cassandra Use-case
Wikimedia Content API: A Cassandra Use-case
 

Viewers also liked

Crash course intro to cassandra
Crash course   intro to cassandraCrash course   intro to cassandra
Crash course intro to cassandraJon Haddad
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
Jon Haddad
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
Jon Haddad
 
Enter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy SparkEnter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy Spark
Jon Haddad
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
Jon Haddad
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day Toronto
Jon Haddad
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandra
Jon Haddad
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
Jon Haddad
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
Jon Haddad
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
Jon Haddad
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
Jon Haddad
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
Jon Haddad
 
A educação de mônica
A educação de mônicaA educação de mônica
A educação de mônicaDalila Melo
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax Academy
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax Academy
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
DataStax Academy
 
DataStax: 7 Deadly Sins for Cassandra Ops
DataStax: 7 Deadly Sins for Cassandra OpsDataStax: 7 Deadly Sins for Cassandra Ops
DataStax: 7 Deadly Sins for Cassandra Ops
DataStax Academy
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax Academy
 

Viewers also liked (20)

Crash course intro to cassandra
Crash course   intro to cassandraCrash course   intro to cassandra
Crash course intro to cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
 
Enter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy SparkEnter the Snake Pit for Fast and Easy Spark
Enter the Snake Pit for Fast and Easy Spark
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Cassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day TorontoCassandra Core Concepts - Cassandra Day Toronto
Cassandra Core Concepts - Cassandra Day Toronto
 
Python and cassandra
Python and cassandraPython and cassandra
Python and cassandra
 
Introduction to Cassandra - Denver
Introduction to Cassandra - DenverIntroduction to Cassandra - Denver
Introduction to Cassandra - Denver
 
Python & Cassandra - Best Friends
Python & Cassandra - Best FriendsPython & Cassandra - Best Friends
Python & Cassandra - Best Friends
 
Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014Diagnosing Problems in Production: Cassandra Summit 2014
Diagnosing Problems in Production: Cassandra Summit 2014
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Python performance profiling
Python performance profilingPython performance profiling
Python performance profiling
 
A educação de mônica
A educação de mônicaA educação de mônica
A educação de mônica
 
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
DataStax: How to Roll Cassandra into Production Without Losing your Health, M...
 
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetchDataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
DataStax: Old Dogs, New Tricks. Teaching your Relational DBA to fetch
 
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra DeploymentsBattery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
Battery Ventures: Simulating and Visualizing Large Scale Cassandra Deployments
 
DataStax: 7 Deadly Sins for Cassandra Ops
DataStax: 7 Deadly Sins for Cassandra OpsDataStax: 7 Deadly Sins for Cassandra Ops
DataStax: 7 Deadly Sins for Cassandra Ops
 
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
DataStax & O'Reilly Media: Large Scale Data Analytics with Spark and Cassandr...
 

Similar to Cassandra meetup slides - Oct 15 Santa Monica Coloft

Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
DataStax
 
Cassandra data access
Cassandra data accessCassandra data access
Cassandra data access
techblog
 
eBuddy | Cassandra Data Access in Java
eBuddy | Cassandra Data Access in JavaeBuddy | Cassandra Data Access in Java
eBuddy | Cassandra Data Access in Java
DataStax Academy
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Building RESTtful services in MEAN
Building RESTtful services in MEANBuilding RESTtful services in MEAN
Building RESTtful services in MEAN
Madhukara Phatak
 
MongoDB
MongoDBMongoDB
MongoDB
wiTTyMinds1
 
Mongodb
MongodbMongodb
Mongodb
Thiago Veiga
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
Lawrence Mwai
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
DataStax Academy
 
What are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docxWhat are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docx
Technogeeks
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
Anna Ossowski
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
ElieHannouch
 
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleKyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Lviv Startup Club
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
Ha-Yang(White) Moon
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
Adam Doyle
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB
 
Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
Amer Ather
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
Michael Spector
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
DigitalOcean
 

Similar to Cassandra meetup slides - Oct 15 Santa Monica Coloft (20)

Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 
Cassandra data access
Cassandra data accessCassandra data access
Cassandra data access
 
eBuddy | Cassandra Data Access in Java
eBuddy | Cassandra Data Access in JavaeBuddy | Cassandra Data Access in Java
eBuddy | Cassandra Data Access in Java
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Building RESTtful services in MEAN
Building RESTtful services in MEANBuilding RESTtful services in MEAN
Building RESTtful services in MEAN
 
MongoDB
MongoDBMongoDB
MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
Introduction to mongo db
Introduction to mongo dbIntroduction to mongo db
Introduction to mongo db
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
What are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docxWhat are the major components of MongoDB and the major tools used in it.docx
What are the major components of MongoDB and the major tools used in it.docx
 
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
[Virtual Meetup] Using Elasticsearch as a Time-Series Database in the Endpoin...
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleKyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
 
MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개MongoDB 4.0 새로운 기능 소개
MongoDB 4.0 새로운 기능 소개
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
Public Cloud Workshop
Public Cloud WorkshopPublic Cloud Workshop
Public Cloud Workshop
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Real-time analytics with Druid at Appsflyer
Real-time analytics with Druid at AppsflyerReal-time analytics with Druid at Appsflyer
Real-time analytics with Druid at Appsflyer
 
Kubernetes for Beginners
Kubernetes for BeginnersKubernetes for Beginners
Kubernetes for Beginners
 

Recently uploaded

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Cassandra meetup slides - Oct 15 Santa Monica Coloft

  • 1. SHIFT.com Migrating from MongoDB to Cassandra by: Blake Eggleston & Jon Haddad
  • 2. What is SHIFT.com? Shift is a platform that enables marketers to communicate across organizations and departments in one single place. It’s also an open application platform with a set of applications built on top of it that can communicate with one another.
  • 3. Initial Stack ● Python ○ Flask ○ Celery ● MongoDB ○ mongoengine ● Neo4j / Titan ○ Bulbs ○ thunderdome ● Redis ● AWS ○ m1.xlarge for mongo
  • 4. Current Stack ● Python ○ still flask ○ still celery ○ gevent (it rocks) ● Cassandra ○ 1.2.6 ○ cqlengine ● ElasticSearch ● Redis ○ jondis ● AWS ○ m1.xlarge
  • 5. Why did we move to Cassandra? ● Operational Benefits ○ Adding and removing nodes is much easier, compared to Mongo’s shards ● Control over our Data on Disk (LSMT) ● Love CQL3 ● Long term scalability ○ Scales Linearly ○ Multi DC Support Baked in
  • 6. Migration Goals ● Zero downtime ○ We wanted to roll out Cassandra without any service interruptions ● No loss of performance ○ By carefully structuring our schema we were able to match MongoDB’s performance.
  • 8. Benefits of CQL3 ● Easy to understand if you’re coming from RDBMS ● Collections ○ sets, lists, maps ● Batch Queries ● Clustering Keys ○ Handles ordering of logical rows ○ Saved us from column name management scheme and allowed us to focus on our data
  • 12. Data Modelling Patterns ● considerations: working with Mongo’s dbrefs and optimizing layout on disk ● structured tables as materialized views of the queries we planned on using ● moving multiple documents into a single physical row ● creating supporting index tables for looking up logical rows
  • 13. Time Series: Message Stream ● Users have tens of thousands of messages ● Each users message stream is specific to them, like a twitter feed ● This is Cassandra’s strength - Time Series ● Considered Redis - but poor for multi-dc create table news_feed ( user_id uuid, message_id timeuuid, message, primary key (user_id, message_id));
  • 14. cqlengine ● ● ● ● ● cqlengine.org the Python CQL3 object-row mapper exposes CQL3 tables as Python classes maps columns to properties builds CQL queries #model definition class ExampleModel(Model): example_id = columns.UUID (primary_key=True) example_type = columns.Integer(index=True) created_at = columns.DateTime() description = columns.Text(required=False) # example query ExampleModel.objects(example_type=1)
  • 15. Improvements from moving to C* ● ● ● ● Operationally we’ve had zero problems Outstanding Performance Easy to build new features Community has been amazing (mailing list and #cassandra)
  • 16. misc tips ● leveled compaction - good for read heavy workloads ● use secondary indexes sparingly, understand how they work and when to use them ● to reiterate, think about how you’re going to query your data ● use elastic search / solr for ad hoc queries
  • 17. Contact Info Jon Haddad @rustyrazorblade jon@shift.com Blake Eggleston @blakeeggleston blake@shift.com ….we’re hiring!