How to size up an Apache Cassandra cluster (Training)

How To Size Up A Cassandra Cluster
Joe Chu, Technical Trainer
jchu@datastax.com
April 2014
©2014 DataStax Confidential. Do not distribute without consent.

What is Apache Cassandra?
• Distributed NoSQL database
• Linearly scalable
• Highly available with no single point of failure
• Fast writes and reads
• Tunable data consistency
• Rack and Datacenter awareness
©2014 DataStax Confidential. Do not distribute without consent. 2

Peer-to-peer architecture
• All nodes are the same
• No master / slave architecture
• Less operational overhead for better scalability.
• Eliminates single point of failure, increasing availability.
Master
Slave
Slave
Peer
Peer
PeerPeer
Peer

Linear Scalability
• Operation throughput increases linearly with the number of
nodes added.

Data Replication
• Cassandra can write copies of data on different nodes.
RF = 3
• Replication factor setting determines the number of copies.
• Replication strategy can replicate data to different racks and
and different datacenters.
INSERT INTO user_table (id, first_name,
last_name) VALUES (1, „John‟, „Smith‟); R1
R2
R3

Node
• Instance of a running Cassandra process.
• Usually represented a single machine or server.

Rack
• Logical grouping of nodes.
• Allows data to be replicated across different racks.

Datacenter
• Grouping of nodes and racks.
• Each data center can have separate replication settings.
• May be in different geographical locations, but not always.

Cluster
• Grouping of datacenters, racks, and nodes that communicate
with each other and replicate data.
• Clusters are not aware of other clusters.

Consistency Models
• Immediate consistency
When a write is successful, subsequent reads are
guaranteed to return that latest value.
• Eventual consistency
When a write is successful, stale data may still be read but
will eventually return the latest value.

Tunable Consistency
• Cassandra offers the ability to chose between immediate and
eventual consistency by setting a consistency level.
• Consistency level is set per read or write operation.
• Common consistency levels are ONE, QUORUM, and ALL.
• For multi-datacenters, additional levels such as
LOCAL_QUORUM and EACH_QUORUM to control cross-
datacenter traffic.

CL ONE
• Write: Success when at least one replica node has
acknowleged the write.
• Read: Only one replica node is given the read request.
R1
R2
R3Coordinator
Client
RF = 3

CL QUORUM
• Write: Success when a majority of the replica nodes has
acknowledged the write.
• Read: A majority of the nodes are given the read request.
• Majority = ( RF / 2 ) + 1
©2013 DataStax Confidential. Do not distribute without consent. 13©2014 DataStax Confidential. Do not distribute without consent. 13
R1
R2
R3Coordinator
Client
RF = 3

CL ALL
• Write: Success when all of the replica nodes has
acknowledged the write.
• Read: All replica nodes are given the read request.
©2013 DataStax Confidential. Do not distribute without consent. 14©2014 DataStax Confidential. Do not distribute without consent. 14
R1
R2
R3Coordinator
Client
RF = 3

Log-Structured Storage Engine
• Cassandra storage engine inspired by Google BigTable
• Key to fast write performance on Cassandra
Memtable
SSTable SSTable SSTable
Commit
Log

Updates and Deletes
• SSTable files are immutable and cannot be changed.
• Updates are written as new data.
• Deletes write a tombstone, which mark a row or column(s) as
deleted.
• Updates and deletes are just as fast as inserts.
id:1, first:John,
last:Smith
timestamp: …405
id:1, first:John,
last:Williams
timestamp: …621
id:1, deleted
timestamp: …999

Compaction
• Periodically an operation is triggered that will merge the data
in several SSTables into a single SSTable.
• Helps to limits the number of SSTables to read.
• Removes old data and tombstones.
• SSTables are deleted after compaction
id:1, first:John,
last:Smith
timestamp:405
id:1, first:John,
last:Williams
timestamp:621
id:1, deleted
timestamp:999
New SSTable
id:1, deleted
timestamp:999
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Cluster Sizing

Cluster Sizing Considerations
• Replication Factor
• Data Size
“How many nodes would I need to store my data set?”
• Data Velocity (Performance)
“How many nodes would I need to achieve my desired
throughput?”

Choosing a Replication Factor

What Are You Using Replication For?
• Durability or Availability?
• Each node has local durability (Commit Log), but replication
can be used for distributed durability.
• For availability, a recommended setting is RF=3.
• RF=3 is the minimum necessary to achieve both consistency
and availability using QUORUM and LOCAL_QUORUM.

How Replication Can Affect Consistency Level
• When RF < 3, you do not have as much flexibility when
choosing consistency and availability.
• QUORUM = ALL
R1
R2
Coordinator
Client
RF = 2

Using A Larger Replication Factor
• When RF > 3, there is more data usage and higher latency for
operations requiring immediate consistency.
• If using eventual consistency, a consistency level of ONE will
have consistent performance regardless of the replication
factor.
• High availability clusters may use a replication factor as high
as 5.

Data Size

Disk Usage Factors
• Data Size
• Replication Setting
• Old Data
• Compaction
• Snapshots

Data Sizing
• Row and Column Data
• Row and Column Overhead
• Indices and Other Structures

Replication Overhead
• A replication factor > 1 will effectively multiply your data size
by that amount.
RF = 1 RF = 2 RF = 3

Old Data
• Updates and deletes do not actually overwrite or delete data.
• Older versions of data and tombstones remain in the SSTable
files until they are compacted.
• This becomes more important for heavy update and delete
workloads.

Compaction
• Compaction needs free disk space to write the new
SSTable, before the SSTables being compacted are removed.
• Leave enough free disk space on each node to allow
compactions to run.
• Worst case for the Size Tier Compaction Strategy is 50% of
the total data capacity of the node.
• For the Leveled Compaction Strategy, that is about 10% of
the total data capacity.

Snapshots
• Snapshots are hard-links or copies of SSTable data files.
• After SSTables are compacted, the disk space may not be
reclaimed if a snapshot of those SSTables were created.
Snapshots are created when:
• Executing the nodetool snapshot command
• Dropping a keyspace or table
• Incremental backups
• During compaction

Recommended Disk Capacity
• For current Cassandra versions, the ideal disk capacity is
approximate 1TB per node if using spinning disks and 3-5 TB
per node using SSDs.
• Having a larger disk capacity may be limited by the resulting
performance.
• What works for you is still dependent on your data model
design and desired data velocity.

Data Velocity (Performance)

How to Measure Performance
• I/O Throughput
“How many reads and writes can be completed per
second?”
• Read and Write Latency
“How fast should I be able to get a response for my read and
write requests?”

Sizing for Failure
• Cluster must be sized taking into account the performance
impact caused by failure.
• When a node fails, the corresponding workload must be
absorbed by the other replica nodes in the cluster.
• Performance is further impacted when recovering a node.
Data must be streamed or repaired using the other replica
nodes.

Hardware Considerations for Performance
CPU
• Operations are often CPU-intensive.
• More cores are better.
Memory
• Cassandra uses JVM heap memory.
• Additional memory used as off-heap memory by Cassandra,
or as the OS page cache.
Disk
• C* optimized for spinning disks, but SSDs will perform better.
• Attached storage (SAN) is strongly discouraged.

Some Final Words…

Summary
• Cassandra allows flexibility when sizing your cluster from a
single node to thousands of nodes
• Your use case will dictate how you want to size and configure
your Cassandra cluster. Do you need availability? Immediate
consistency?
• The minimum number of nodes needed will be determined by
your data size, desired performance and replication factor.

Additional Resources
• DataStax Documentation
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architectu
re/architecturePlanningAbout_c.html
• Planet Cassandra
http://planetcassandra.org/nosql-cassandra-education/
• Cassandra Users Mailing List
user-subscribe@cassandra.apache.org
http://mail-archives.apache.org/mod_mbox/cassandra-user/

Questions?
Questions?

Thank You
We power the big data
apps that transform business.
41©2014 DataStax Confidential. Do not distribute without consent.

How to size up an Apache Cassandra cluster (Training)

More Related Content

What's hot

Similar to How to size up an Apache Cassandra cluster (Training)

More from DataStax Academy

Recently uploaded

How to size up an Apache Cassandra cluster (Training)

Editor's Notes