Dockerizing Cassandra on Modern Linux
Myself & Instaclustr
• Adam Zegelin — Founding Software Engineer & Co-founder of Instaclustr

adam@instaclustr.com · @zegelin
• Managed DataStax Enterprise and Apache Cassandra in the ☁ 

(AWS, Azure, SoftLayer)
• Self-service dashboard — create, manage & monitor clusters
• 24/7/365 support, on-call engineers, uptime guarantee
• Focus on developing your awesome apps — we handle the Cassandra
• Grew from a need for Cassandra in a project
2© 2015. All Rights Reserved.
Nodes — Software Stack
• CoreOS — lightweight OS
• Docker — containerisation of everything
• systemd — service managemen
• journald — logging
• D-Bus — controlling systemd from Java from inside containers
3© 2015. All Rights Reserved.
Initial Implementation
• Amazon Web Services only
• Custom Ubuntu AMI (Amazon Machine Image)
• Based on stock Ubuntu AMI
• 2 AMIs (PV/HVM) × 9 regions = 18 images per version!

(became unmaintainable very quickly)
• Custom cloud-init scripts — RAID disks, fetch config, etc.
• Cassandra installed with apt-get install cassandra / dse
4© 2015. All Rights Reserved.
Initial Implementation — AWS
• We selected instance storage backed AWS instances
• Instance storage is fast (SSDs) and low latency (local disk) but is volatile
— terminate the instance and all your data is gone
• The alternative, EBS (Elastic Block Storage), is basically SAN — slower,
higher latency and originally shared instance network bandwidth
• The newer c4.x and m4.x instances are “EBS optimised” and don’t share these limitations
• Only way to change AMI is to start a new machine
• Not possible to use immutable images with persistent ephemeral data
• Only feasible solution for updates is apt-get install
5© 2015. All Rights Reserved.
• One of the first “Docker Operating Systems”
• Available on every provider we support — AWS, Azure, SoftLayer
• CoreOS has pre-built images
• Small and minimalist — not much userland (not even man!)
• Other useful software — etcd, fleet, etc.

(we currently don’t use them — but maybe in the future)
• In-use by some big players (Rackspace, PlayStation, Instaclustr 😀 )
• Recent funding from Google Ventures
6© 2015. All Rights Reserved.
• Container runtime + standardised image distribution & hosting + ecosystem
• Private image hosting options available, such as quay.io
• Immutable images — Yay! 🎉
• Images running in dev, test and production environments are equal
• Software installs, upgrades and uninstalls are clean
• Components are isolated — potentially conflicting components (different library
versions, JVM versions, etc.) can co-exist
• Even different userland layouts (Ubuntu, Debian, CentOS, etc)
7© 2015. All Rights Reserved.
• We containerise everything — C*, internal services, node
management and monitoring apps
• Single, well understood, image build and deploy process —
docker build & docker push
• Executed via Makefiles — one Make target per image — make push-all builds
and pushes everything
• Helps that all our internal apps are Java-based too
8© 2015. All Rights Reserved.
• Docker gives us immutable images for our components without
instance replacement
• CoreOS handles the rest (OS-level) via in-place updates
• Docker is provider agnostic
• CoreOS runs on all major cloud providers and bare-metal
• The result ☞ Instaclustr-managed C* can run anywhere #
9© 2015. All Rights Reserved.
+
systemd
• CoreOS uses systemd for service management
• systemd supports inter-service dependencies
• e.g. cassandra-backups.service “wants” cassandra.service
• aka, cassandra-backups can only run when cassandra is running
• systemd can automatically restart services
• Instaclustr services are fail-fast
• Cassandra not so much — in some cases — watchdog?
10© 2015. All Rights Reserved.
systemd cont’d
• Manages units of different types — service, timer, target, etc.
• service units manage processes
• timers start services on a schedule (ala cron)
• targets are for grouping/sync points
• cassandra.target “wants” cassandra.service, monitoring.serivce, datastax-
agent.service, backups.timer, etc
• All units can define dependencies and conflicts
• Dependencies of different “strengths” — Wants vs. Requires
• In both directions — Requires and RequiredBy
11© 2015. All Rights Reserved.
Basic Integration
• Cassandra runs as PID 1 in the container
• 1 primary process per container model
• Runs in foreground mode (-f)
• Responds to SIGTERM via docker stop, systemctl stop, etc
• Cassandra data and configuration is persistent on host
• Survives container restart
• Cassandra data and configuration directories mounted from host

docker run -v /var/lib/instaclustr/etc/cassandra:/etc/cassandra …
12© 2015. All Rights Reserved.
Basic Integration cont’d
• Docker containers managed via systemd
• cassandra.service execs docker run cassandra …
• systemctl [start|stop|restart|status|…] cassandra
• Cassandra logging configured to write only to stdout
• systemd logging best practice
• Cassandra ⇢ Docker ⇢ systemd ⇢ journald
• journalctl -u cassandra
13© 2015. All Rights Reserved.
Basic Integration — Issues
• systemd starts dependent units when state is active
• process running = service active — unless configured otherwise
• ∴ dependent units start immediately
• process can hang but service stays active
14© 2015. All Rights Reserved.
Cassandra Startup
• JVM starts quickly
• JMX (nodetool) connectivity is available early
• Objects are exposed where they are constructed
• CQL/Thrift available late
• Can be toggled via cassandra.yaml or JMX/nodetool
• When is Cassandra “running”?
• When does cassandra.service transition from activating to active?
• When do dependent services start?
15© 2015. All Rights Reserved.
D-Bus
• RPC between processes
• Notifications
• Socket-based (typically UNIX sockets, but can be TCP)
• Accessible inside a container — mount the socket

docker run -v /run/dbus:/run/dbus -v /run/systemd:/run/systemd …
• Multiple language bindings, including Java
16© 2015. All Rights Reserved.
D-Bus cont’d
• systemd is controlable via D-Bus
• Control host systemd inside a Docker container
• No need to fork/exec to run systemctl and co.

(in-fact, systemctl is a wrapper around D-Bus calls)
17© 2015. All Rights Reserved.
D-Bus cont’d
Java bindings — dbus-java
systemctl restart cassandra
≝
systemdManager.RestartUnit("cassandra.service", "replace");
18© 2015. All Rights Reserved.
Enhanced Integration
• Service status = “active” — process running, or something more?
• Cassandra java process running vs. C* accepting CQL connections
• CQL clients are dependencies, but shouldn’t start until CQL is available
• Clients could fail-fast on no connectivity
• Will be automatically restarted
• Service will oscillate between active and failed — hard to detect
actual failures
• systemd will eventually timeout or give up — configurable
• JVM startup can be expensive — CPU usage spikes
19© 2015. All Rights Reserved.
Enhanced Integration cont’d
• systemd targets for CQL & Thrift — cassandra-cql.target
• Life-cycle tracks internal C* service
• i.e., Starts when CQL is available — not immediate
• nodetool disablebinary implies systemctl stop cassandra-cql.target
• Services that require CQL connectivity use

WantedBy=cassandra-cql.target
• Starting cassandra-cql.target starts these services too
• Inverse of Wants
20© 2015. All Rights Reserved.
Enhanced Integration cont’d
• Java Agent side-loaded into Cassandra JVM
• Hooks into CQL/Thrift service life-cycle
• Implemented using runtime byte-code modification
• Controls systemd via D-Bus to start/stop associated
target units
• But Cassandra is open-source — why not modify‽
• Agents work with DSE & Apache Cassandra
21© 2015. All Rights Reserved.
Java Agent
• Java Agents (java.lang.instrument)
• java -javaagent:instaclustr-agent.jar …
• premain(…) method called at JVM startup
• can hook into JVM class-loading, transform byte-code, etc.
• Javassist, ASM — byte-code modification libraries
22© 2015. All Rights Reserved.
Hooks
public interface Server {

public void start();



public void stop();
⋮

}
// in CassandraDaemon:
// Thrift

thriftServer = new ThriftServer(rpcAddr, rpcPort, listenBacklog);
⋮

thriftServer.start();
⋮

thriftServer.stop();


// CQL

nativeServer = new org.apache.cassandra.transport.Server(nativeAddr, nativePort);
⋮
nativeServer.start();
⋮
nativeServer.stop();
23© 2015. All Rights Reserved.
Hooks
public static void premain(String agentArgs, Instrumentation inst) {

inst.addTransformer((loader, className, classBeingRedefined, protectionDomain, classfileBuffer) -> {

if (!"org/apache/cassandra/transport/Server".equals(className))

return null;



final ClassPool pool = ClassPool.getDefault();

try {

final CtClass ctClass = pool.get("org.apache.cassandra.transport.Server");

// patch start() and stop() methods of the Server class

{

final CtMethod method = ctClass.getDeclaredMethod("start");

method.insertAfter("com.instaclustr.Agent.serverStarted($0);");

}

{

final CtMethod method = ctClass.getDeclaredMethod("stop");

method.insertAfter("com.instaclustr.Agent.serverStopped($0);");

}



byte[] byteCode = ctClass.toBytecode();

ctClass.detach();



return byteCode; // return the modified byte-code



} catch (final Exception e) {…}



return null;

});

}
// called when Server started — call systemd via dbus-java to start cassandra-cql.target
public static void serverStarted(final CassandraDaemon.Server server) {…}

// called when Server stopped — call systemd via dbus-java to stop cassandra-cql.target

public static void serverStopped(final CassandraDaemon.Server server) {…}
24© 2015. All Rights Reserved.
Docker Limitations and Sore Spots
• docker run is just a TTY proxy — actual container process is under
the docker dæmon process/cgroup
• systemd requires startup & watchdog notifications to originate
from started process, child, or process in same cgroup
• docker crash = all containers go bye-bye
• docker … everything — inc. image downloads & builds — runs as
root in the dæmon!
• processes inside containers are run un-elevated
25© 2015. All Rights Reserved.
Future
• Devel. systemd can now launch Docker containers natively via
machinectl
• Tighter integration with systemd
• Process hierarchy is correct — right cgroup and parents
• Java Agent can notify systemd for startup, status &
watchdog — via JNA + libsystemd
26© 2015. All Rights Reserved.
Thanks!

Cassandra on Docker

  • 1.
  • 2.
    Myself & Instaclustr •Adam Zegelin — Founding Software Engineer & Co-founder of Instaclustr
 adam@instaclustr.com · @zegelin • Managed DataStax Enterprise and Apache Cassandra in the ☁ 
 (AWS, Azure, SoftLayer) • Self-service dashboard — create, manage & monitor clusters • 24/7/365 support, on-call engineers, uptime guarantee • Focus on developing your awesome apps — we handle the Cassandra • Grew from a need for Cassandra in a project 2© 2015. All Rights Reserved.
  • 3.
    Nodes — SoftwareStack • CoreOS — lightweight OS • Docker — containerisation of everything • systemd — service managemen • journald — logging • D-Bus — controlling systemd from Java from inside containers 3© 2015. All Rights Reserved.
  • 4.
    Initial Implementation • AmazonWeb Services only • Custom Ubuntu AMI (Amazon Machine Image) • Based on stock Ubuntu AMI • 2 AMIs (PV/HVM) × 9 regions = 18 images per version!
 (became unmaintainable very quickly) • Custom cloud-init scripts — RAID disks, fetch config, etc. • Cassandra installed with apt-get install cassandra / dse 4© 2015. All Rights Reserved.
  • 5.
    Initial Implementation —AWS • We selected instance storage backed AWS instances • Instance storage is fast (SSDs) and low latency (local disk) but is volatile — terminate the instance and all your data is gone • The alternative, EBS (Elastic Block Storage), is basically SAN — slower, higher latency and originally shared instance network bandwidth • The newer c4.x and m4.x instances are “EBS optimised” and don’t share these limitations • Only way to change AMI is to start a new machine • Not possible to use immutable images with persistent ephemeral data • Only feasible solution for updates is apt-get install 5© 2015. All Rights Reserved.
  • 6.
    • One ofthe first “Docker Operating Systems” • Available on every provider we support — AWS, Azure, SoftLayer • CoreOS has pre-built images • Small and minimalist — not much userland (not even man!) • Other useful software — etcd, fleet, etc.
 (we currently don’t use them — but maybe in the future) • In-use by some big players (Rackspace, PlayStation, Instaclustr 😀 ) • Recent funding from Google Ventures 6© 2015. All Rights Reserved.
  • 7.
    • Container runtime+ standardised image distribution & hosting + ecosystem • Private image hosting options available, such as quay.io • Immutable images — Yay! 🎉 • Images running in dev, test and production environments are equal • Software installs, upgrades and uninstalls are clean • Components are isolated — potentially conflicting components (different library versions, JVM versions, etc.) can co-exist • Even different userland layouts (Ubuntu, Debian, CentOS, etc) 7© 2015. All Rights Reserved.
  • 8.
    • We containeriseeverything — C*, internal services, node management and monitoring apps • Single, well understood, image build and deploy process — docker build & docker push • Executed via Makefiles — one Make target per image — make push-all builds and pushes everything • Helps that all our internal apps are Java-based too 8© 2015. All Rights Reserved.
  • 9.
    • Docker givesus immutable images for our components without instance replacement • CoreOS handles the rest (OS-level) via in-place updates • Docker is provider agnostic • CoreOS runs on all major cloud providers and bare-metal • The result ☞ Instaclustr-managed C* can run anywhere # 9© 2015. All Rights Reserved. +
  • 10.
    systemd • CoreOS usessystemd for service management • systemd supports inter-service dependencies • e.g. cassandra-backups.service “wants” cassandra.service • aka, cassandra-backups can only run when cassandra is running • systemd can automatically restart services • Instaclustr services are fail-fast • Cassandra not so much — in some cases — watchdog? 10© 2015. All Rights Reserved.
  • 11.
    systemd cont’d • Managesunits of different types — service, timer, target, etc. • service units manage processes • timers start services on a schedule (ala cron) • targets are for grouping/sync points • cassandra.target “wants” cassandra.service, monitoring.serivce, datastax- agent.service, backups.timer, etc • All units can define dependencies and conflicts • Dependencies of different “strengths” — Wants vs. Requires • In both directions — Requires and RequiredBy 11© 2015. All Rights Reserved.
  • 12.
    Basic Integration • Cassandraruns as PID 1 in the container • 1 primary process per container model • Runs in foreground mode (-f) • Responds to SIGTERM via docker stop, systemctl stop, etc • Cassandra data and configuration is persistent on host • Survives container restart • Cassandra data and configuration directories mounted from host
 docker run -v /var/lib/instaclustr/etc/cassandra:/etc/cassandra … 12© 2015. All Rights Reserved.
  • 13.
    Basic Integration cont’d •Docker containers managed via systemd • cassandra.service execs docker run cassandra … • systemctl [start|stop|restart|status|…] cassandra • Cassandra logging configured to write only to stdout • systemd logging best practice • Cassandra ⇢ Docker ⇢ systemd ⇢ journald • journalctl -u cassandra 13© 2015. All Rights Reserved.
  • 14.
    Basic Integration —Issues • systemd starts dependent units when state is active • process running = service active — unless configured otherwise • ∴ dependent units start immediately • process can hang but service stays active 14© 2015. All Rights Reserved.
  • 15.
    Cassandra Startup • JVMstarts quickly • JMX (nodetool) connectivity is available early • Objects are exposed where they are constructed • CQL/Thrift available late • Can be toggled via cassandra.yaml or JMX/nodetool • When is Cassandra “running”? • When does cassandra.service transition from activating to active? • When do dependent services start? 15© 2015. All Rights Reserved.
  • 16.
    D-Bus • RPC betweenprocesses • Notifications • Socket-based (typically UNIX sockets, but can be TCP) • Accessible inside a container — mount the socket
 docker run -v /run/dbus:/run/dbus -v /run/systemd:/run/systemd … • Multiple language bindings, including Java 16© 2015. All Rights Reserved.
  • 17.
    D-Bus cont’d • systemdis controlable via D-Bus • Control host systemd inside a Docker container • No need to fork/exec to run systemctl and co.
 (in-fact, systemctl is a wrapper around D-Bus calls) 17© 2015. All Rights Reserved.
  • 18.
    D-Bus cont’d Java bindings— dbus-java systemctl restart cassandra ≝ systemdManager.RestartUnit("cassandra.service", "replace"); 18© 2015. All Rights Reserved.
  • 19.
    Enhanced Integration • Servicestatus = “active” — process running, or something more? • Cassandra java process running vs. C* accepting CQL connections • CQL clients are dependencies, but shouldn’t start until CQL is available • Clients could fail-fast on no connectivity • Will be automatically restarted • Service will oscillate between active and failed — hard to detect actual failures • systemd will eventually timeout or give up — configurable • JVM startup can be expensive — CPU usage spikes 19© 2015. All Rights Reserved.
  • 20.
    Enhanced Integration cont’d •systemd targets for CQL & Thrift — cassandra-cql.target • Life-cycle tracks internal C* service • i.e., Starts when CQL is available — not immediate • nodetool disablebinary implies systemctl stop cassandra-cql.target • Services that require CQL connectivity use
 WantedBy=cassandra-cql.target • Starting cassandra-cql.target starts these services too • Inverse of Wants 20© 2015. All Rights Reserved.
  • 21.
    Enhanced Integration cont’d •Java Agent side-loaded into Cassandra JVM • Hooks into CQL/Thrift service life-cycle • Implemented using runtime byte-code modification • Controls systemd via D-Bus to start/stop associated target units • But Cassandra is open-source — why not modify‽ • Agents work with DSE & Apache Cassandra 21© 2015. All Rights Reserved.
  • 22.
    Java Agent • JavaAgents (java.lang.instrument) • java -javaagent:instaclustr-agent.jar … • premain(…) method called at JVM startup • can hook into JVM class-loading, transform byte-code, etc. • Javassist, ASM — byte-code modification libraries 22© 2015. All Rights Reserved.
  • 23.
    Hooks public interface Server{
 public void start();
 
 public void stop(); ⋮
 } // in CassandraDaemon: // Thrift
 thriftServer = new ThriftServer(rpcAddr, rpcPort, listenBacklog); ⋮
 thriftServer.start(); ⋮
 thriftServer.stop(); 
 // CQL
 nativeServer = new org.apache.cassandra.transport.Server(nativeAddr, nativePort); ⋮ nativeServer.start(); ⋮ nativeServer.stop(); 23© 2015. All Rights Reserved.
  • 24.
    Hooks public static voidpremain(String agentArgs, Instrumentation inst) {
 inst.addTransformer((loader, className, classBeingRedefined, protectionDomain, classfileBuffer) -> {
 if (!"org/apache/cassandra/transport/Server".equals(className))
 return null;
 
 final ClassPool pool = ClassPool.getDefault();
 try {
 final CtClass ctClass = pool.get("org.apache.cassandra.transport.Server");
 // patch start() and stop() methods of the Server class
 {
 final CtMethod method = ctClass.getDeclaredMethod("start");
 method.insertAfter("com.instaclustr.Agent.serverStarted($0);");
 }
 {
 final CtMethod method = ctClass.getDeclaredMethod("stop");
 method.insertAfter("com.instaclustr.Agent.serverStopped($0);");
 }
 
 byte[] byteCode = ctClass.toBytecode();
 ctClass.detach();
 
 return byteCode; // return the modified byte-code
 
 } catch (final Exception e) {…}
 
 return null;
 });
 } // called when Server started — call systemd via dbus-java to start cassandra-cql.target public static void serverStarted(final CassandraDaemon.Server server) {…}
 // called when Server stopped — call systemd via dbus-java to stop cassandra-cql.target
 public static void serverStopped(final CassandraDaemon.Server server) {…} 24© 2015. All Rights Reserved.
  • 25.
    Docker Limitations andSore Spots • docker run is just a TTY proxy — actual container process is under the docker dæmon process/cgroup • systemd requires startup & watchdog notifications to originate from started process, child, or process in same cgroup • docker crash = all containers go bye-bye • docker … everything — inc. image downloads & builds — runs as root in the dæmon! • processes inside containers are run un-elevated 25© 2015. All Rights Reserved.
  • 26.
    Future • Devel. systemdcan now launch Docker containers natively via machinectl • Tighter integration with systemd • Process hierarchy is correct — right cgroup and parents • Java Agent can notify systemd for startup, status & watchdog — via JNA + libsystemd 26© 2015. All Rights Reserved.
  • 27.