Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 1
Managing (Schema) Migrations in Cassandra
Mitch Gitman
senior software engineer
GridPoint, Inc.

10/23/2015

10/23/2015
migration
A word with many meanings.

10/23/2015
disclaimer…
image © Ana Camamiel

What I mean by migrations
• Live-data migrations
10/23/2015
One-off as opposed to ETL

What I mean by migrations
• Source-driven migrations
− Schema migrations
− Reference data migrations
− Test/sample data migrations
• CQL commands as opposed to real data (sstables), generally
10/23/2015
source control
versioning
artifact versioning
publish

Database refactoring
10/23/2015

• Integration test & functional test automation (bootstrap-ability)
• CI server pipelines
• Containerization??
• Consistency & repeatability across environments
− Local developer box
− Dev environments
− Integration & QA environments
− Staging
− Production
Source-driven DB refactoring—the benefits
10/23/2015

We need tools!
• Built into web application frameworks
• Standalone
10/23/2015

What do (perhaps) all these tools have in common?
10/23/2015
They’re relational. They’re for SQL.

NoSQL Distilled
10/23/2015
Chapter 12. Schema Migrations
"We have seen that developing and maintaining
an application in the brave new world of
schemaless databases requires careful
attention to be given to schema migration."
either/or:
• RDBMS = strong schema
• NoSQL = no schema

10/23/2015
CREATE TABLE entities (
doc_id int,
attribute_name String,
attribute_value String,
...
PRIMARY KEY(doc_id, attribute_name)
);
• partition keys & clustering keys
• table-per-query denormalization
• shift from Thrift to CQL
• Thrift: super columns & super column families
• CQL: collection types
“metadata-driven documents
in columnar storage:”
Does Cassandra like weak schemas?
So how have teams been
managing their keyspace & table
definitions?

The Cassandra migration tools landscape
10/23/2015
• Flyway: First-class Cassandra support.
− Requires JDBC.
− https://github.com/flyway/flyway/issues/823
• Pillar: Scala tool.
• mutagen-cassandra: Java tool, Astyanax driver.
• Trireme: Python tool.
• cql-migrate: Python tool.
• mschematool: Python tool.

What’s the secret behind DB migration tools?
10/23/2015
The migrations version tracking table

Migration tool philosophies
10/23/2015
© Martha Stewart Living Omnimedia Inc. © Harpo Print, LLC

Flyway for Cassandra
10/23/2015
• First-class Flyway• Faked-out Flyway
migrations
(in SQL)
CQL

The tradeoff
10/23/2015
• Store the migrations tracking table in an RDBMS

Programmatically invoke Flyway
10/23/2015

10/23/2015

CassandraFlywayCallback
10/23/2015
implements FlywayCallback

Two-step process
10/23/2015
source control
artifact repository
MigrationsBuilder
FlywayMigrator

The migrations source
10/23/2015
The input to
MigrationsBuilder

10/23/2015
Run MigrationsBuilder for CQL:
Run MigrationsBuilder for SQL:

The generated
migrations
10/23/2015
The output from
MigrationsBuilder

The generated SQL script
10/23/2015
Faking out Flyway

10/23/2015
Run FlywayMigrator for CQL:
Run FlywayMigrator for SQL:
java -classpath /…/flyway-migrator-postgresql.jar
com.gridpoint.tools.migrator.flyway.FlywayMigrator postgresql
java -classpath /…/flyway-migrator-cassandra.jar
com.gridpoint.tools.migrator.flyway.FlywayMigrator cassandra

10/23/2015
flyway-migrator-postgresql.jarflyway-migrator-cassandra.jar

The migrations version tracking table
10/23/2015
The Cassandra incarnation

Best practices
10/23/2015
• Variations on versions
− Version control: f94c7d7f8b130df360a4e9e4f586eafc618ddc50
− Artifact repository: 3.5.1
− Migration tool: 201505270800 or 10 or whatever you want
− Effective contract versions—multiple versions can coexist at runtime
• Consistent deployment across environments
• Failure handling
• Baselining
• Rollbacks?
• Check schema agreement

Schema agreement
10/23/2015
https://datastax.github.io/java-driver/2.1.8/features/metadata/

Cassandra… migrations… limitations
10/23/2015
• Limitations of our Flyway-based solution
− You need a relational database
− Not open-sourced
• Limitations of source-driven migrations, in general

Static vs. dynamic tables
10/23/2015

Deploy time vs. runtime
10/23/2015
Dedicated migration application vs. part of main application

Source-driven, but…
10/23/2015
• The orchestration is in source control
• Actual data rather than CQL commands
− Not necessarily live data
− Maybe doesn’t need to be in source control

Embracing polyglot persistence
10/23/2015
A unified migrations solution

Takeaways
10/23/2015
•challenging
•exciting
•routine
•boring

10/23/2015
Thank you!
Mitch Gitman
 mgitman@gridpoint.com
 mgitman@nilistics.net
 mgitman@gmail.com
 skeletal presence @ LinkedIn

Managing (Schema) Migrations in Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Managing (Schema) Migrations in Cassandra

Similar to Managing (Schema) Migrations in Cassandra (20)

More from DataStax Academy

More from DataStax Academy (20)

Recently uploaded

Recently uploaded (20)

Managing (Schema) Migrations in Cassandra

Editor's Notes