Continuous Deployment with Cassandra

Continuous Deployment with C*:
Treating C* as First-Class Code
Michael Kjellman
@mkjellman
Software Engineer, Barracuda Networks

C* At Barracuda
• Powers 100% of our Spam and Webfilter Backend
• 48 Node Cluster
• 2 Datacenters
• Requests: 20k writes/sec 30k reads/sec
• Latency: 1 ms/write 1.6 ms/read
• > 30TB of Data
• Almost entirely native protocol/CQL3

Hardware Configuration
• 32GB of RAM
• 1x SSD
• 2x Spinning Disks
• 2x 6 Core AMD

Key Configuration Options
• key_cache_size_in_mb: 1024
• row_cache_size_in_mb: 0
• memtable_total_space_in_mb: 2048
• HEAP_NEWSIZE = “1200M” (-Xmn)
• MAX_HEAP_SIZE = “8G” (-Xmx)
• -XX:SurvivorRatio=6
• Sidenote: Java 7u40 is out!

How do I keep my graphs pretty during
a C* upgrade?
September 18th 2013

Make a C* Build
$> git clone http://git-wip-
us.apache.org/repos/asf/cassandra.git
$> git checkout –t origin/cassandra-1.2
$> git log
$> vim build.xml (change version number every
time you make a build!)
$> ant clean release

Deployment
• Make release
• Test release with CCM
• Push release to Puppet (deals with config, etc)
• Run controlled and scripted rolling restart one datacenter
at a time
– flush
– stop
– start
– validate node

So, why not just
apt-get install cassandra?
• Makes running a custom release in the future a
complete nightmare
• Lost visibility into changes in the release
• WHY are you upgrading
• Treat a C* build just as if it was a release of your
code. What commits did you put into your own
release?

MY CODE DOESN’T WORK WITHOUT A
STABLE C* CLUSTER
Simply Put:

When things go wrong
• Every commit (those by C* committers or my
own) come with potential bugs and regressions
• Gossip Bugs Can Bite Hard:
– CASSANDRA-5665: Gossiper.handleMajorStateChange
can lose existing node ApplicationState
• At 48 nodes, even small mistakes are massive

Writing your code to deal with node
failure
• Upgrading a C* cluster means constant node
failures for the duration of the rolling restart
• How does your code deal with read latency and
retries
– CASSANDRA-4705: Eager Retries for reads for 2.0+
• The mythical “constantly failing” code != stability.
– Handle exceptions (and node/read failures) gracefully!

Why treat C* like your own code
• Using C* will move much of your own
application logic to C*
• The bugs have to go somewhere!
• Data replication at database layer or at
application layer

QUESTIONS?
Thanks for Listening!

Continuous Deployment with Cassandra

More Related Content

What's hot

Viewers also liked

Similar to Continuous Deployment with Cassandra

Recently uploaded

Continuous Deployment with Cassandra