Looking towards
an Official C*
Sidecar
Joey Lynch and Vinay Chella
Cloud Database Engineering
Speakers
Joey Lynch
Senior Software Engineer
Cloud Database Engineering
Netflix
Distributed system addict and
data wrangler
Vinay Chella
Senior Software Engineer
Cloud Database Engineering
Netflix
Cassandra Specialist,
Distributed systems engineer
and creator of NdBench
● Operating C*
● Operating C* with a sidecar
● State of community sidecars
● Lessons learned from Priam (Netflix’s main C* sidecar)
● Goals of C* management sidecar (CASSANDRA-14395)
Agenda
● Bootstrap and data movement
● Configuration (files, jmx)
● Maintenance
● Monitoring/Metrics
● Backup/Restore
● Repair
Operating C*
Create a New Cluster
● Seeds
● Token assignment
Operating C*: Bootstrapping
Add/Remove/Replace
● Serial or parallel?
● Streaming?
Operating C*: Configuration
● Probably Have to Tune
a. cassandra.yaml
b. topology props
c. JVM options
● May Have to Tune
a. Logging
b. Incremental Backup
c. More JVM options
Operating C*: Lifecycle
Rolling Restarts (Upgrades)
● Semi-complex single
node procedure
● One at a time is too slow
● Token range aware
restarts?
What happens when
Cassandra dies?
Ring source @ https://v2.overleaf.com/read/zchtrzskkyjb
Operating C*: Maintenance
● All the Power of JMX
● … So many possibilities
a. Many work with jmxterm/jmxsh
b. Many only work with Java code
● What if you want to do it on all
nodes?
● Many Metrics (good!)
● How to Collect Them?
○ JMX … no
○ Agent!
● Which agent ...
Operating C*: Monitoring
● Cassandra ring health
depends on replication
● Strategies
○ Monitor replication of
keyspaces
○ Topology Aware
○ Maintenance Aware
Operating C*: Ring Health
Operating C*: Backup/Restore
The
Cloud
● What even do I need to backup!?
● Restore is legitimately tricky, do you practice?
“Eventually” Consistent
1. Partial Write
2. Read Repair
3. Hints play
… Nope not enough
4. Repair
Operating C*: Repair
N1 N2 N3
0 1 0
0 1 1
0 1 1
0 1 1
1 1 1
N4 N5 N6
0 0 0
0 0 0
0 1 0
0 1 0
1 1 1
Datacenter 1 Datacenter 2
Operating C* In General
Operating C* In General
Separate solutions for ...
● Bootstrap and data movement
● Maintenance
● Configuration (files, jmx)
● Monitoring/Metrics
● Backup/Restore
● Repair
So ... What is needed to use C*?
We need better tools!
Operating C* with Sidecar(s)
What’s a Sidecar?
Cassandra
metrics-agent
Priam
jvmkill
...
Sidecars Live Outside
Main Daemon Scope
● Often built for a specific
purpose
● Typically a different OS
process
● There isn’t going to be
“one” sidecar
Sidecar: Bootstrapping
Automatic Seed Management using ASGs/db
Automatic Instance Replacement
Equation+Graph from “Cassandra Availability with Virtual Nodes” by Joey Lynch and Josh Snyder
Sidecar: Configuration
● Hierarchy: Environment -> Cluster -> Node
● Flat namespace that is merged to provide Priam
config
Sidecar: Configuration
● Hierarchy
● Flat namespace that
is merged to provide
Priam config
● Functions for
defaults (e.g. based
on cpu)
prod
cass_nflx
i-08da5d...
Sidecar: Lifecycle
Fail
Healthcheck
Drain with
timeout
Execute Stop
Script
(systemd)
Sidecar: Lifecycle
Pass
Healthcheck
Execute Start
Script
(systemd)
Ensure
Health
Rolling Restarts (Upgrades)
● Cluster automation is now much easier
What happens when Cassandra dies?
● Continuous health monitoring and supervision (OOM)
● Priam + systemd + jvmkill1
== pretty good
1
https://github.com/airlift/jvmkill
Sidecar: Maintenance
● JMX methods on cron
● Can add arbitrary tasks like compactions, flushes, etc
Sidecar: Maintenance
● Sidecar provides JMX over HTTP
○ Cleanup
○ Invoke complex JMX methods
using curl
○ Many of these are better done
scheduled (e.g. repair,
compaction, flushes, etc)
Sidecar: Monitoring
Sidecar: Monitoring
Cassandra
metrics-agent
Metrics
Sink
Partition tolerant push
● Sidecar agent = scalable
● Non agent JMX metric
export … not even once
● Still a sidecar, just not
Priam
● Sidecar monitoring local
node
● Can also look at the ring
○ Gossip state
● Can
○ Ask a few Priams
○ Export info to
streaming system
Sidecar: Ring Health
Sidecar: Backup/Restore
● Backup on cron schedule
○ Snapshot
● Parallel incrementals
● Verification
● Pluggable with code (S3, GCS, etc…)
Sidecar: Backup/Restore
(coming soon)
● Similar to cassandra-mirror1
● Keeps track of what files have already been uploaded
○ Unlocks minute level snapshot backups
○ Only uploads files once
● Continuous, point in time, self healing, eventually
consistent
1: https://github.com/hashbrowncipher/cassandra-mirror
Sidecar: Repair
(coming soon)
1: https://issues.apache.org/jira/browse/CASSANDRA-14346
● Always on
● Just works
● In Cassandra
itself? 1
State of Community Sidecars
State of Community Sidecars
Everyone builds their own
State of Community Sidecars
State of Community Tooling
State of Community Tooling 2
State of Community Tooling 3
State of Community Backup
And that’s not even all of ‘em
… but do they solve our
problems?
Puppet/
Chef
Terraform CRON Fabric /
SSH in a
for loop
Reaper Priam Jenkins Sensu/
Nagios
Bootstrap No Yes No Maybe No Yes Maybe No
Maintenance Yes Maybe Maybe Yes No Yes Yes No
Configuration Yes Maybe No Maybe No Yes Maybe No
Monitoring Maybe No Maybe Maybe No Yes Maybe Yes
Backup Maybe No Maybe Maybe Maybe Yes Yes No
Repair No No Sorta Yes Yes Soon Yes No
Easy to Use Yes No* Yes Yes Yes No Maybe Yes
Multi-Cloud Yes Yes Yes Yes Yes No Yes Yes
Does it Scale Yes Yes Yes Sorta Yes* Yes Sorta* Yes
Oops.
Lessons Learned
Netflix has operated Cassandra using Priam and other
sidecars for years
… We have learned a few things
● Netflix OSS specific
Lessons learned from Priam
● Netflix OSS specific
● Not easy to use externally
Lessons learned from Priam
● Netflix OSS specific
● Not easy to use externally
● Multi-cloud?
Lessons learned from Priam
● Netflix OSS specific
● Not easy to use externally
● Multi-cloud?
● ~3 different
implementations of
everything
Lessons learned from Priam
● Netflix OSS specific
● Not easy to use externally
● Multi-cloud?
● ~3 different
implementations of
everything
● Version compatibility
Lessons learned from Priam
Lesson Learned: Control Plane
Orchestrator
Node1
Node2
Node3
1. Restart!
2. Restart!
3. Restart!
“Push Control”
● Relies on unreliable
communication (ssh < http)
● Hard to make eventually
consistent
● Hard to guarantee safety
● Very easy to implement
Lesson Learned: Control Plane
Orchestrator?
Node1
Node2
Node3
Restart C*
older than Y!
Desire State Store
(Cassandra, Config)
Anything to do?
Ah, time to
restart!
Is it safe?
“Pull Control”
● Write desires into state store (or config)
● Nodes coordinate through state store as needed
● Tradeoff: Hard to implement
Lesson Learned: Control Plane
Pull Push>>
1. Easy to Use
Goals of C* management process
2. Solves or eases most
common problems
Goals of C* management process
3. Pluggable
Goals of C* management process
4. Scalable
Goals of C* management process
Feedback on Scope/Design?
Management Process:
https://issues.apache.org/jira/browse/CASSANDRA-14395
Repair/Task Scheduler:
https://issues.apache.org/jira/browse/CASSANDRA-14346
Thank you.

Looking towards an official cassandra sidecar netflix

  • 1.
    Looking towards an OfficialC* Sidecar Joey Lynch and Vinay Chella Cloud Database Engineering
  • 2.
    Speakers Joey Lynch Senior SoftwareEngineer Cloud Database Engineering Netflix Distributed system addict and data wrangler Vinay Chella Senior Software Engineer Cloud Database Engineering Netflix Cassandra Specialist, Distributed systems engineer and creator of NdBench
  • 3.
    ● Operating C* ●Operating C* with a sidecar ● State of community sidecars ● Lessons learned from Priam (Netflix’s main C* sidecar) ● Goals of C* management sidecar (CASSANDRA-14395) Agenda
  • 4.
    ● Bootstrap anddata movement ● Configuration (files, jmx) ● Maintenance ● Monitoring/Metrics ● Backup/Restore ● Repair Operating C*
  • 5.
    Create a NewCluster ● Seeds ● Token assignment Operating C*: Bootstrapping Add/Remove/Replace ● Serial or parallel? ● Streaming?
  • 6.
    Operating C*: Configuration ●Probably Have to Tune a. cassandra.yaml b. topology props c. JVM options ● May Have to Tune a. Logging b. Incremental Backup c. More JVM options
  • 7.
    Operating C*: Lifecycle RollingRestarts (Upgrades) ● Semi-complex single node procedure ● One at a time is too slow ● Token range aware restarts? What happens when Cassandra dies? Ring source @ https://v2.overleaf.com/read/zchtrzskkyjb
  • 8.
    Operating C*: Maintenance ●All the Power of JMX ● … So many possibilities a. Many work with jmxterm/jmxsh b. Many only work with Java code ● What if you want to do it on all nodes?
  • 9.
    ● Many Metrics(good!) ● How to Collect Them? ○ JMX … no ○ Agent! ● Which agent ... Operating C*: Monitoring
  • 10.
    ● Cassandra ringhealth depends on replication ● Strategies ○ Monitor replication of keyspaces ○ Topology Aware ○ Maintenance Aware Operating C*: Ring Health
  • 11.
    Operating C*: Backup/Restore The Cloud ●What even do I need to backup!? ● Restore is legitimately tricky, do you practice?
  • 12.
    “Eventually” Consistent 1. PartialWrite 2. Read Repair 3. Hints play … Nope not enough 4. Repair Operating C*: Repair N1 N2 N3 0 1 0 0 1 1 0 1 1 0 1 1 1 1 1 N4 N5 N6 0 0 0 0 0 0 0 1 0 0 1 0 1 1 1 Datacenter 1 Datacenter 2
  • 13.
  • 14.
  • 15.
    Separate solutions for... ● Bootstrap and data movement ● Maintenance ● Configuration (files, jmx) ● Monitoring/Metrics ● Backup/Restore ● Repair So ... What is needed to use C*?
  • 16.
  • 17.
  • 18.
    What’s a Sidecar? Cassandra metrics-agent Priam jvmkill ... SidecarsLive Outside Main Daemon Scope ● Often built for a specific purpose ● Typically a different OS process ● There isn’t going to be “one” sidecar
  • 19.
    Sidecar: Bootstrapping Automatic SeedManagement using ASGs/db Automatic Instance Replacement Equation+Graph from “Cassandra Availability with Virtual Nodes” by Joey Lynch and Josh Snyder
  • 20.
    Sidecar: Configuration ● Hierarchy:Environment -> Cluster -> Node ● Flat namespace that is merged to provide Priam config
  • 21.
    Sidecar: Configuration ● Hierarchy ●Flat namespace that is merged to provide Priam config ● Functions for defaults (e.g. based on cpu) prod cass_nflx i-08da5d...
  • 22.
  • 23.
    Sidecar: Lifecycle Pass Healthcheck Execute Start Script (systemd) Ensure Health RollingRestarts (Upgrades) ● Cluster automation is now much easier What happens when Cassandra dies? ● Continuous health monitoring and supervision (OOM) ● Priam + systemd + jvmkill1 == pretty good 1 https://github.com/airlift/jvmkill
  • 24.
    Sidecar: Maintenance ● JMXmethods on cron ● Can add arbitrary tasks like compactions, flushes, etc
  • 25.
    Sidecar: Maintenance ● Sidecarprovides JMX over HTTP ○ Cleanup ○ Invoke complex JMX methods using curl ○ Many of these are better done scheduled (e.g. repair, compaction, flushes, etc)
  • 26.
  • 27.
    Sidecar: Monitoring Cassandra metrics-agent Metrics Sink Partition tolerantpush ● Sidecar agent = scalable ● Non agent JMX metric export … not even once ● Still a sidecar, just not Priam
  • 28.
    ● Sidecar monitoringlocal node ● Can also look at the ring ○ Gossip state ● Can ○ Ask a few Priams ○ Export info to streaming system Sidecar: Ring Health
  • 29.
    Sidecar: Backup/Restore ● Backupon cron schedule ○ Snapshot ● Parallel incrementals ● Verification ● Pluggable with code (S3, GCS, etc…)
  • 30.
    Sidecar: Backup/Restore (coming soon) ●Similar to cassandra-mirror1 ● Keeps track of what files have already been uploaded ○ Unlocks minute level snapshot backups ○ Only uploads files once ● Continuous, point in time, self healing, eventually consistent 1: https://github.com/hashbrowncipher/cassandra-mirror
  • 31.
    Sidecar: Repair (coming soon) 1:https://issues.apache.org/jira/browse/CASSANDRA-14346 ● Always on ● Just works ● In Cassandra itself? 1
  • 32.
  • 33.
    State of CommunitySidecars Everyone builds their own
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
    And that’s noteven all of ‘em … but do they solve our problems?
  • 40.
    Puppet/ Chef Terraform CRON Fabric/ SSH in a for loop Reaper Priam Jenkins Sensu/ Nagios Bootstrap No Yes No Maybe No Yes Maybe No Maintenance Yes Maybe Maybe Yes No Yes Yes No Configuration Yes Maybe No Maybe No Yes Maybe No Monitoring Maybe No Maybe Maybe No Yes Maybe Yes Backup Maybe No Maybe Maybe Maybe Yes Yes No Repair No No Sorta Yes Yes Soon Yes No Easy to Use Yes No* Yes Yes Yes No Maybe Yes Multi-Cloud Yes Yes Yes Yes Yes No Yes Yes Does it Scale Yes Yes Yes Sorta Yes* Yes Sorta* Yes
  • 41.
  • 42.
    Lessons Learned Netflix hasoperated Cassandra using Priam and other sidecars for years … We have learned a few things
  • 43.
    ● Netflix OSSspecific Lessons learned from Priam
  • 44.
    ● Netflix OSSspecific ● Not easy to use externally Lessons learned from Priam
  • 45.
    ● Netflix OSSspecific ● Not easy to use externally ● Multi-cloud? Lessons learned from Priam
  • 46.
    ● Netflix OSSspecific ● Not easy to use externally ● Multi-cloud? ● ~3 different implementations of everything Lessons learned from Priam
  • 47.
    ● Netflix OSSspecific ● Not easy to use externally ● Multi-cloud? ● ~3 different implementations of everything ● Version compatibility Lessons learned from Priam
  • 48.
    Lesson Learned: ControlPlane Orchestrator Node1 Node2 Node3 1. Restart! 2. Restart! 3. Restart! “Push Control” ● Relies on unreliable communication (ssh < http) ● Hard to make eventually consistent ● Hard to guarantee safety ● Very easy to implement
  • 49.
    Lesson Learned: ControlPlane Orchestrator? Node1 Node2 Node3 Restart C* older than Y! Desire State Store (Cassandra, Config) Anything to do? Ah, time to restart! Is it safe? “Pull Control” ● Write desires into state store (or config) ● Nodes coordinate through state store as needed ● Tradeoff: Hard to implement
  • 50.
    Lesson Learned: ControlPlane Pull Push>>
  • 51.
    1. Easy toUse Goals of C* management process
  • 52.
    2. Solves oreases most common problems Goals of C* management process
  • 53.
    3. Pluggable Goals ofC* management process
  • 54.
    4. Scalable Goals ofC* management process
  • 55.
    Feedback on Scope/Design? ManagementProcess: https://issues.apache.org/jira/browse/CASSANDRA-14395 Repair/Task Scheduler: https://issues.apache.org/jira/browse/CASSANDRA-14346
  • 56.