Introduction to Cassandra

Introduction to
Cassandra
Wellington Ruby on Rails User Group
Aaron Morton @aaronmorton
24/11/2010

Disclaimer.
This is an introduction not
a reference.

I may, from time to time
and for the best possible
reasons, bullshit you.

What do you already know
about Cassandra?

The next slide has a lot on
it.

Cassandra is a distributed,
fault tolerant, scalable,
column oriented data
store.

A word about “column
oriented”.

It’s different to a row
oriented DB like MySQL.
So...

For now, think about keys
and values.Where each
value is a hash / dict.

Cassandra’s data model and
on disk storage are based
on the Google Bigtable
paper from 2006.

The distributed cluster
design is based on the
Amazon Dynamo paper
from 2007.

{‘foo’ => {‘bar’ => ‘baz’,},}
{key => {col_name =>
col_value,},}

Easy.
Lets store ‘foo’ somewhere.

But I want to be able to
read it back if one machine
fails.

Lets distribute it on 3 of
the 5 nodes I have.

This is the Replication
Factor.
Called RF or N.

Each node has a token that
identiﬁes the upper value of
the key range it is
responsible for.

#1
<= E
#2
<= J
#3
<= O
#4
<= T
#5
<= Z

Client connects to a
random node and asks it to
coordinate storing the ‘foo’
key.

Each node knows about all
other nodes in the cluster,
including their tokens.

This is achieved using a
Gossip protocol. Every
second each node shares
it’s full view of the cluster
with 1 to 3 other nodes.

Our coordinator is node 5.
It knows node 2 is
responsible for the ‘foo’
key.

#1
<= E
#2
'foo'
#3
<= O
#4
<= T
#5
<= Z
Client

What if we have lots of
values between F and J?

We end up with a “hot”
section in our ring of
nodes.

You shouldn't have a hot
section in your ring.
mmmkay?

A Partitioner is used to
apply a transform to the
key.The transformed values
are also used to deﬁne a
nodes’ range.

The Random Partitioner
applies a MD5 transform.
The range of all possible
keys values is changed to a
128 bit number.

There are other
Partitioners, such as the
Order Preserving Partition.
But start with the Random
Partitioner.

Let’s pretend all keys are
now transformed to an
integer between 0 and 9.

Our 5 node cluster now
looks like.

#1
<= 2
#2
<= 4
#3
<= 6
#4
<= 8
#5
<= 0

Pretend our ‘foo’ key
transforms to 3.

#1
<= 2
#2
"3"
#3
<= 6
#4
<= 8
#5
<= 0
Client

But where are the replicas?
We want to replicate the
‘foo’ key 3 times.

A Replication Strategy is
used to determine which
nodes should store replicas.

It’s also used to work out
which nodes should have a
value when reading.

Simple Strategy orders the
nodes by their token and
places the replicas around
the ring.

Network Topology Strategy
is aware of the racks and
Data Centres your servers
are in. Can split replicas
between DC’s.

Simple Strategy will do in
most cases.

Our coordinator will send
the write to all 3 nodes at
once.

#1
<= 2
#2
"3"
#3
"3"
#4
"3"
#5
<= 0
Client

Once the 3 replicas tell the
coordinator they have
ﬁnished, it will tell the client
the write completed.

Hang on.
What about fault tolerant?
What if node #4 is down?

The client must specify a
Consistency Level for each
operation.

Consistency Level speciﬁes
how many nodes must
agree before the operation
is a success.

For reads is known as R.
For writes is known as W.

Here are the simple ones
(there are a few more)...

One.
The coordinator will only
wait for one node to
acknowledge the write.

The cluster will work to
eventually make all copies
of the data consistent.

To get consistent behaviour
make sure that R + W > N.
You can do this by...

Always using Quorum for
read and writes.
Or...

Use All for writes and One
for reads.
Or...

Use All for reads and One
for writes.

Try our write again, using
Quorum consistency level.

Coordinator will wait for 2
nodes to complete the
write before telling the
client has completed.

What about when node 4
comes online?

It will not have our “foo”
key.

Won’t somebody please
think of the “foo” key!?

During our write the
coordinator will send a
Hinted Handoff to one of
the online replicas.

Hinted Handoff tells the
node that one of the
replicas was down and
needs to be updated later.

#1
<= 2
#2
"3"
#3
"3"
#4
"3"
#5
<= 0
Client
send "3"
to #4

When node 4 comes back
up, node 3 will eventually
process the Hinted
Handoffs and send the
“foo” key to it.

What if the “foo” key is
read before the Hinted
Handoff is processed?

#1
<= 2
#2
"3"
#3
"3"
#4
""
#5
<= 0
Client
send "3"
to #4

At our Quorum CL the
coordinator asks all nodes
that should have replicas to
perform the read.

Once CL nodes have
returned, their values are
compared.

If the do not match a Read
Repair process is kicked off.

A timestamp provided by
the client during the write
is used to determine the
“latest” value.

The “foo” key is written to
node 4, and consistency
achieved, before the
coordinator returns to the
client.

At lower CL the Read
Repair happens in the
background and is
probabilistic.

We can force Cassandra to
repair everything using the
Anti Entropy feature.

Anti Entropy is the main
feature for achieving
consistency. RR and HH are
optimisations.

Anti Entropy started
manually via command line
or Java JMX.

But ratemylolcats.com is
going to be huge.
How do I store 100 Million
pictures of cats?

More disk capacity, disk IO,
memory, CPU, network IO.
More everything.

A Keyspace is the container
for everything in your
application.

Keyspaces can be thought
of as Databases.

A Column Family is a
container for ordered and
indexed Columns.

Columns have a name,
value, and timestamp
provided by the client.

The CF indexes the
columns by name and
supports get operations by
name.

CF’s do not deﬁne which
columns can be stored in
them.

Column Families have a
large memory overhead.

You typically have few (<10)
CF’s in your Keyspace. But
there is no limit.

We have Rows.
Rows have a key.

Rows store columns in one
or more Column Families.

Different rows can store
different columns in the
same Column Family.

User CF
username => fred
d_o_b => 04/03
username => bob
city => wellington
key => fred
key => bob

A key can store different
columns in different
Column Families.

User CF
username => fred
d_o_b => 04/03
09:01 => tweet_60
09:02 => tweet_70
key => fred
key => fred
Timeline CF

Here comes the Super
Column Family to ruin it all.

A Super Column Family is a
container for ordered and
indexes Super Columns.

A Super Column has a
name and an ordered and
indexed list of Columns.

So the Super Column
Family just gives another
level to our hash.

Social Super CF
following => {
bob => 01/01/2010,
tom => 01/02/2010}
followers => {
bob => 01/01/2010}
key => fred

Introduction to Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Similar to Introduction to Cassandra

Similar to Introduction to Cassandra (20)

More from aaronmorton

More from aaronmorton (19)

Recently uploaded

Recently uploaded (20)

Introduction to Cassandra