Introduction to Cassandra Architecture

Nick Bailey
@nickmbailey
Intro to Cassandra Architecture
1

Dynamo Paper(2007)
• How do we build a data store that is:
• Reliable
• Performant
• “Always On”
• Nothing new and shiny
• 24 papers cited
Also the basis for Riak and Voldemort

BigTable(2006)
• Richer data model
• 1 key. Lots of values
• Fast sequential access
• 38 Papers cited

Cassandra(2008)
• Distributed features of Dynamo
• Data Model and storage from
BigTable
• February 17, 2010 it graduated to
a top-level Apache project

Cassandra - More than one server
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server 
7

8
Cassandra HBase Redis MySQL
THROUGHPUTOPS/SEC)
VLDB benchmark

Cassandra - Fully Replicated
• Client writes local
• Data syncs across WAN
• Replication per Data Center
9

Cassandra for Applications
APACHE
CASSANDRA

Summary
•The evolution of the internet and online data created new
problems
•Apache Cassandra was based on a variety of
technologies to solve these problems
•The goals of Apache Cassandra are all about staying
online and performant
•Apache Cassandra is a database best used for
applications, close to your users

4.1.2 Cassandra - Basic Architecture

Row
Column
1
Partition
Key 1
Column
2
Column
3
Column
4

Partition
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4

Partition with Clustering
Cluster
1
Partition
Key 1
Column
1
Column
2
Column
3
Cluster
2
Partition
Key 1
Column
1
Column
2
Column
3
Cluster
3
Partition
Key 1
Column
1
Column
2
Column
3
Cluster
4
Partition
Key 1
Column
1
Column
2
Column
3

Table Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Column
2
Column
3
Column
4
Column
1
Column
2
Column
3
Column
4
Column
1
Column
2
Column
3
Column
4
Partition
Key 2
Partition
Key 2
Partition
Key 2

Keyspace
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 1
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Column
1
Partition
Key 2
Column
2
Column
3
Column
4
Table 1 Table 2
Keyspace 1

Token
Server
•Each partition is a 128 bit value
•Consistent hash between 2-63 and 264
•Each node owns a range of those
values
•The token is the beginning of that
range to the next node’s token value
•Virtual Nodes break these down
further
Data
Token Range
0 …

The cluster Server
Token Range
0 0-100
0-100

The cluster Server
Token Range
0 0-50
51 51-100
Server
0-50
51-100

The cluster Server
Token Range
0 0-25
26 26-50
51 51-75
76 76-100
Server
ServerServer
0-25
76-100
26-5051-75

Summary
•Tables store rows of data by column
•Partitions are similar data grouped by a partition key
•Keyspaces contain tables and are grouped by data center
•Tokens show node placement in the range of cluster data

4.1.3 Cassandra - Replication, High Availability and Multi-datacenter

Replication
10.0.0.
1
DC1
DC1: RF=1
Node Primary
10.0.0.1 00-25
10.0.0.2 26-50
10.0.0.3 51-75
10.0.0.4 76-100
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75

Replication
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
DC1
DC1: RF=2
Node Primary Replica
10.0.0.1 00-25 76-100
10.0.0.2 26-50 00-25
10.0.0.3 51-75 26-50
10.0.0.4 76-100 51-75
76-100
00-25
26-50
51-75

Replication DC1
DC1: RF=3
Node Primary Replica Replica
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50

Replication DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
???

Consistency level
Consistency Level Number of Nodes Acknowledged
One One - Read repair triggered
Local One One - Read repair in local DC
Quorum 51%
Local Quorum 51% in local DC

Consistency DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
CL= One

Consistency DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
CL= Quorum

Multi-datacenter
DC1
DC1: RF=3
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
10.0.0.1
00-25
10.0.0.4
76-100
10.0.0.2
26-50
10.0.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
Client
Write to
partition 15
DC2
10.1.0.1
00-25
10.1.0.4
76-100
10.1.0.2
26-50
10.1.0.3
51-75
76-100
51-75
00-25
76-100
26-50
00-25
51-75
26-50
10.0.0.1 00-25 76-100 51-75
10.0.0.2 26-50 00-25 76-100
10.0.0.3 51-75 26-50 00-25
10.0.0.4 76-100 51-75 26-50
DC2: RF=3

Summary
•Replication Factor indicates how many times your data is
copied
•Consistency Level specifies how many replicas are
consistent at read or write
•Replication along with Consistency Factor are critical for
uptime

4.2.1.1.3 Cassandra - Read and Write Path (Node Architecture)

Writes
CREATE TABLE raw_weather_data ( 
wsid text, 
year int, 
month int, 
day int, 
hour int, 
temperature double, 
dewpoint double, 
pressure double, 
wind_direction int, 
wind_speed double, 
sky_condition int, 
sky_condition_text text, 
one_hour_precip double, 
six_hour_precip double, 
PRIMARY KEY ((wsid), year, month, day, hour) 
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);

Writes
CREATE TABLE raw_weather_data ( 
wsid text, 
year int, 
month int, 
day int, 
hour int, 
temperature double, 
PRIMARY KEY ((wsid), year, month, day, hour) 
) WITH CLUSTERING ORDER BY (year DESC, month DESC, day DESC, hour DESC);
INSERT INTO raw_weather_data(wsid,year,month,day,hour,temperature) 
VALUES (‘10010:99999’,2005,12,1,10,-5.6);
VALUES (‘10010:99999’,2005,12,1,9,-5.1);
VALUES (‘10010:99999’,2005,12,1,8,-4.9);
VALUES (‘10010:99999’,2005,12,1,7,-5.3);

Write Path
Client
VALUES (‘10010:99999’,2005,12,1,7,-5.3);
year 1wsid 1 month 1 day 1 hour 1
Memtable
SSTable
SSTable
SSTable
SSTable
Node
Commit Log Data * Compaction *
Temp
Temp
Memory
Disk

Read Path
Client
SSTable
SSTable
SSTable
Node
Data
SELECT wsid,hour,temperature 
FROM raw_weather_data 
WHERE wsid='10010:99999' 
AND year = 2005 AND month = 12 AND day = 1  
AND hour >= 7 AND hour <= 10;
Memtable
Temp
Temp
Memory
Disk

Summary
•By default, writes are durable
•Client receives ack when consistency level is achieved
•Reads must always go to disk
•Compaction is data housekeeping

Introduction to Cassandra Architecture

More Related Content

What's hot

Viewers also liked

Similar to Introduction to Cassandra Architecture

More from nickmbailey

Recently uploaded

Introduction to Cassandra Architecture