Bootstrap From Backups
Reducing cluster load while adding capacity
#CassandraSummit
instaclustr.com
Who am I and what do I do?
• Ben Bromhead
• Co-founder and CTO of Instaclustr -> www.instaclustr.com
• Instaclustr provides Cassandra-as-a-Service in the cloud.
• Currently in AWS, Google Cloud in private beta with more to come.
• We currently manage 50+ nodes for various customers, who do
various things with it.
Cassandra and Scaling
• Premise: We have an existing cluster and we need either more storage / better
performance / higher availability.
• Normally fairly awesome, most people do the following:
• Set seed nodes, Start Cassandra.
• Node joins ring and take responsibility for some portion of the ring.
• Commence the bootstrap process. The joining node streams data from other
nodes for the range, builds indexes etc.
• Specifically the node receives streamed SSTables that contain rows within the
range that it is now responsible for (the data component)
Not perfect, but getting better
• Joining node can violate consistency due to range movements -
Somewhat fixed in 2.1 - See CASSANDRA-2434
• Adding a replacement node with the same address/range
ownership is a different workflow. replace_address workflow is still
tricky for some people. - See CASSANDRA-7356
• Adding nodes to a cluster with multiple racks can also be tricky and
prone to creating hotspots. This is mainly an operational issue.
A wild “fundamental issue” appears…
• Joining nodes add additional load on the existing nodes in the cluster.
• Joining nodes stream data from existing nodes (the node who used to be the primary for the
range that is moving).
• Takes up valuable bandwidth and I/O
• Key requirement: As a managed Cassandra service, we need to make all our operations as
side-effect free as possible.
• Key requirement: Our customers don’t want to worry about operation specific details.
how do we prevent this?
Solutions, part 1
Make sure your nodes never get
stressed.
• Capacity planning (OpsCenter
has some good tools). Traffic
prediction.
Solutions, part 2
Make sure your nodes never get
stressed
• Over provision.
Solutions, part 3
Make sure your nodes never get
stressed.
• Ensure your startup / app /
project / whatever never goes
viral or gets featured in national
media.
Solutions, part 4
If your nodes are already
stressed, very hard to add
capacity.
• Batten down the hatches and
wait for a quiet time?
Solutions, part 5
If your nodes are already
stressed, very hard to add
capacity.
• You are a Cassandra wizard.
Solutions, part 6
If your nodes are already
stressed, very hard to add
capacity.
• Rebuild from another DC.
• Add node, bootstrap = false
and run nodetool rebuild --
OTHER_DC
• All these solutions have various strengths and weaknesses.
• Have side-effects or a relatively costly.
• Still need to address:
• Key requirement: As a managed Cassandra service, we need to make all our operations as
side-effect free as possible.
• Key requirement: Our customers don’t want to worry about operation specific details.
Bootstrap from Backups!
• SSTables are immutable.
• SSTables are also the base unit of data that nodes stream to each
other.
• SSTables are what we backup.
• How about we stream the SStables from the backup location
instead of the live node?
• Define an arbitrary command that streams the sstable to stdout.
• Cassandra will some values (broadcast address and filename) into
the command to help identify which sstable to fetch.
• e.g. cat /mnt/some-nfs-mount/%source/%filename
• Cassandra will run the command in a separate process and read the
sstable from processes stdout stream.
• If the process fails, the node streams the sstable using the current
streaming process. This becomes a performance optimisation rather
than a replacement streaming mechanism.
1 3
2
new node
SSTable1
Normal Bootstrap procedure
1 3
2
new node
SSTable1
Normal Bootstrap procedure
1 3
2
NAS/S3
SSTable1
SSTable1 SSTable2 SSTableN
Normal Cluster with backups
1 3
2
new node
SSTable1
NAS/S3
SSTable1 SSTable2 SSTableN
Bootstrap from backup
1 3
2
new node
SSTable1
NAS/S3
SSTable1 SSTable2 SSTableN
Bootstrap from backup
1 3
2
new node
SSTable1
NAS/S3
SSTable1 SSTable2 SSTableN
Bootstrap from backup - Catch up
How does it look in real life?
This is your cluster on regular bootstrap
Operations
0
7500
15000
22500
30000
Minutes
0 5 10 15 20 25 30 35 40 45 50 55 60
This is your cluster on bootstrap from backups
Operations
0
7500
15000
22500
30000
Minutes
0 5 10 15 20 25 30 35 40 45 50 55 60
Why does this matter
• Mostly side-effect free bootstrapping.
• Explore reactive scaling rather than predictive.
• Makes your cluster more cost effective to run.
When can I use this!?
• Not right now, haven’t even submitted as a patch to the C* project
(we will).
• Currently running in beta with a select few of our customers.
• Not too sure how much of a good idea it is to use stdout as the
stream mechanism. So far so good?
• Will probably need a refactor of the StreamMessage workflow…
currently bootstrap from backups is a has that doesn't fit the current
model.
Questions

Cassandra Bootstap from Backups

  • 1.
    Bootstrap From Backups Reducingcluster load while adding capacity #CassandraSummit instaclustr.com
  • 2.
    Who am Iand what do I do? • Ben Bromhead • Co-founder and CTO of Instaclustr -> www.instaclustr.com • Instaclustr provides Cassandra-as-a-Service in the cloud. • Currently in AWS, Google Cloud in private beta with more to come. • We currently manage 50+ nodes for various customers, who do various things with it.
  • 3.
    Cassandra and Scaling •Premise: We have an existing cluster and we need either more storage / better performance / higher availability. • Normally fairly awesome, most people do the following: • Set seed nodes, Start Cassandra. • Node joins ring and take responsibility for some portion of the ring. • Commence the bootstrap process. The joining node streams data from other nodes for the range, builds indexes etc. • Specifically the node receives streamed SSTables that contain rows within the range that it is now responsible for (the data component)
  • 4.
    Not perfect, butgetting better • Joining node can violate consistency due to range movements - Somewhat fixed in 2.1 - See CASSANDRA-2434 • Adding a replacement node with the same address/range ownership is a different workflow. replace_address workflow is still tricky for some people. - See CASSANDRA-7356 • Adding nodes to a cluster with multiple racks can also be tricky and prone to creating hotspots. This is mainly an operational issue.
  • 5.
    A wild “fundamentalissue” appears… • Joining nodes add additional load on the existing nodes in the cluster. • Joining nodes stream data from existing nodes (the node who used to be the primary for the range that is moving). • Takes up valuable bandwidth and I/O • Key requirement: As a managed Cassandra service, we need to make all our operations as side-effect free as possible. • Key requirement: Our customers don’t want to worry about operation specific details.
  • 6.
    how do weprevent this?
  • 7.
    Solutions, part 1 Makesure your nodes never get stressed. • Capacity planning (OpsCenter has some good tools). Traffic prediction.
  • 8.
    Solutions, part 2 Makesure your nodes never get stressed • Over provision.
  • 9.
    Solutions, part 3 Makesure your nodes never get stressed. • Ensure your startup / app / project / whatever never goes viral or gets featured in national media.
  • 10.
    Solutions, part 4 Ifyour nodes are already stressed, very hard to add capacity. • Batten down the hatches and wait for a quiet time?
  • 11.
    Solutions, part 5 Ifyour nodes are already stressed, very hard to add capacity. • You are a Cassandra wizard.
  • 12.
    Solutions, part 6 Ifyour nodes are already stressed, very hard to add capacity. • Rebuild from another DC. • Add node, bootstrap = false and run nodetool rebuild -- OTHER_DC
  • 13.
    • All thesesolutions have various strengths and weaknesses. • Have side-effects or a relatively costly. • Still need to address: • Key requirement: As a managed Cassandra service, we need to make all our operations as side-effect free as possible. • Key requirement: Our customers don’t want to worry about operation specific details.
  • 14.
    Bootstrap from Backups! •SSTables are immutable. • SSTables are also the base unit of data that nodes stream to each other. • SSTables are what we backup. • How about we stream the SStables from the backup location instead of the live node?
  • 16.
    • Define anarbitrary command that streams the sstable to stdout. • Cassandra will some values (broadcast address and filename) into the command to help identify which sstable to fetch. • e.g. cat /mnt/some-nfs-mount/%source/%filename • Cassandra will run the command in a separate process and read the sstable from processes stdout stream. • If the process fails, the node streams the sstable using the current streaming process. This becomes a performance optimisation rather than a replacement streaming mechanism.
  • 17.
    1 3 2 new node SSTable1 NormalBootstrap procedure
  • 18.
    1 3 2 new node SSTable1 NormalBootstrap procedure
  • 19.
    1 3 2 NAS/S3 SSTable1 SSTable1 SSTable2SSTableN Normal Cluster with backups
  • 20.
    1 3 2 new node SSTable1 NAS/S3 SSTable1SSTable2 SSTableN Bootstrap from backup
  • 21.
    1 3 2 new node SSTable1 NAS/S3 SSTable1SSTable2 SSTableN Bootstrap from backup
  • 22.
    1 3 2 new node SSTable1 NAS/S3 SSTable1SSTable2 SSTableN Bootstrap from backup - Catch up
  • 23.
    How does itlook in real life?
  • 24.
    This is yourcluster on regular bootstrap Operations 0 7500 15000 22500 30000 Minutes 0 5 10 15 20 25 30 35 40 45 50 55 60
  • 25.
    This is yourcluster on bootstrap from backups Operations 0 7500 15000 22500 30000 Minutes 0 5 10 15 20 25 30 35 40 45 50 55 60
  • 26.
    Why does thismatter • Mostly side-effect free bootstrapping. • Explore reactive scaling rather than predictive. • Makes your cluster more cost effective to run.
  • 27.
    When can Iuse this!? • Not right now, haven’t even submitted as a patch to the C* project (we will). • Currently running in beta with a select few of our customers. • Not too sure how much of a good idea it is to use stdout as the stream mechanism. So far so good? • Will probably need a refactor of the StreamMessage workflow… currently bootstrap from backups is a has that doesn't fit the current model.
  • 28.