Deconstructing Apache Cassandra

Deconstructing Cassandra
Fault tolerance at scale

Cassandra node
A Cassandra node is a server within a
Cassandra ring.
A Cassandra node is independent of but
cooperates with other nodes within a
Cassandra ring.
1
5
2
3
4
8
6
7
9

Cassandra ring
A Cassandra ring is a collection of nodes that
form a complete token range.
DC1
1
5
2
3
4
8
6
7
9

Cassandra logical
data center
When more than a single Cassandra ring
exists within a cluster we call each of them a
Cassandra logical data center and name them.
DC1
1
5
2
3
4
8
6
7
9
DC2
1
5
2
3
4
8
6
7
9

Cassandra cluster
A Cassandra cluster is a collection of
Cassandra logical data centers.
A cluster could consist of a single Cassandra
ring or many Cassandra logical data centers.
In this case the Cluster consists of 6x
Cassandra logical data centers in 5 physical
data centers with the 2x Cassandra logical
data centers in Australia in two different
AMZN availability zones.
US1 US2 EU1 JP1
AU1
AZ1
AU2
AZ2

Keyspace
Data in Cassandra is stored in a keyspace. A
keyspace is kind of analogous to a database in
the RDBMS world.
Each node in a Cassandra ring owns part of
the keyspace.
1
5
2
3
4
8
6
7
9

Tokens
Each node owns part of the keyspace via
ownership of a range of tokens.
A Cassandra ring, regardless of the number of
nodes always forms a complete token range.
So if you had 9 nodes each node would own
1/9th of the token range.
If you had only 3 nodes each node would own
1/3rd of the token range.
The actual token range is a ﬁxed number:
-9VERY LARGE NUMBER
-> 9VERY LARGE NUMBER
CLDC
1
5
2
3
4
8
6
7
9
0 - 9
10 - 19
20 - 29
30 - 15
40 - 49
50 - 59
60 - 69
70 - 79
80 - 89
CLDC
1
23

Partition keys
Partition keys (data design concept) map to
tokens.
I may have chosen in a table design that my
partition key is “email_address”.
When we read or write to Cassandra with my
email_address (alex@site.com) Cassandra
ﬁrst of all hashes my email address and
returns a token, with the token Cassandra now
knows via the ring’s topology which node owns
my data.
Note also that my data is replicated to other
nodes, from the token Cassandra can also
work out where my replicas are.
CLDC
1
5
2
3
4
8
6
7
9
0 - 9
10 - 19
20 - 29
30 - 15
40 - 49
50 - 59
60 - 69
70 - 79
80 - 89
CREATE TABLE users {
email_address text,
password text,
age int,
PRIMARY KEY(email_address)
}

Resiliency: the battle to produce a
data storage platform with no
single point of failure

Requirements
● No single point of failure within a
Cassandra ring
● Able to handle failure of nodes within
a Cassandra ring with zero
intervention and zero impact
● Able to handle failure of Cassandra
logical data centers with zero
● Able to handle failure of physical
datacenters with zero intervention and
zero impact
● Able to handle failure of entire
geographic locations with zero
Ask me about 30ms, go on, dare you.
US1 US2 EU1 JP1
AU1
AZ1
AU2
AZ2

No SPOF within a
Cassandra ring
To have no SPOF within a ring each node must
be able to survive independently from all other
nodes, but at the same time co-operate to
complete tasks.
This is not just a problem that Cassandra has
to answer, it is a wider problem of distributed
computing.
So how do you do that?
CLDC
1
5
2
3
4
8
6
7
9

No SPOF within a
Cassandra ring
Firstly to have no SPOF within a ring you:
● cannot have a single point of control,
this means you cannot introduce a
controller (e.g. Zookeeper)
● you cannot use DNS
● you cannot use shared storage (i.e
SAN, NAS etc)
● data must be replicated to multiple
nodes
CLDC
1
5
2
3
4
8
6
7
9

Node
independence
So how do you do that?
Each node needs to be able to independently
source and understand from NO central
location:
● the topology of the ring
● the distributed schema design
● the discovery of other nodes
● the changing state of other nodes
CLDC
1
5
2
3
4
8
6
7
9

Gossip protocol
Each Cassandra node is continually gossiping
with other nodes about the state of the cluster,
the distributed schema design, the topology of
the ring, the latency of other nodes, the state
of other nodes etc.
The gossip protocol converges on agreement
within 1 second in even large clusters.
The gossip protocol is extremely light weight
from a network perspective.
1
5
2
3
4
8
6
7
9
….!
….!
….!
….!
….!
….! ….!
….!
….!

Gossip spans
logical DCs
Gossip does not just work in a single ring, it
spans all Cassandra logical data centers that
form a cluster.
So any Cassandra node in one DC is aware of
the state of nodes in all other DCs.
DC1
1
5
2
3
4
8
6
7
9
….
!
….
!
….
!
….
!
….
!
….
!
….
!
….
!
….
!
DC2
1
5
2
3
4
8
6
7
9
….
!
….
!
….
!
….
!
….
!
….
!
….
!
….
!
….
!

Summary
Each node in a Cassandra cluster is
independent but cooperative.
You could kill 8 of the 9 nodes and the single
remaining node will remain up and active
answering for the part of the token range that
it is authoritative for.
1
5
2
3
4
8
6
7
9
….!
….!
….!
….!
….!
….! ….!
….!
….!

Replication
Replication is conﬁgured at the
keyspace level.
1
5
2
3
4
8
6
7
9

Deconstructing Apache Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deconstructing Apache Cassandra

Similar to Deconstructing Apache Cassandra (20)

Recently uploaded

Recently uploaded (20)

Deconstructing Apache Cassandra