INTRODUCTION: SELECTED CASES
Who use Cassandra?
eBay has Cassandra supporting multiple
applications (Social Signals, Hunch, and many
time series use cases) with clusters spanning
several data centers.
Netflix is using Cassandra on AWS as a key
infrastructure component of its globally distributed
Shazam uses Cassandra cluster to power their
and many others…
Check - http://www.datastax.com/cassandrausers
INTRODUCTION: MOST ADVANTAGES
Most advantages of Cassandra are:
• Fast writes.
• Tunable consistency.
• Integration with Hadoop.
ARCHITECTURE: FAST WRITES
Cassandra is very fast on writes, cause of
use of Log-structured merge tree.
Process of inserting new record into Cassandra
ARCHITECTURE: FAST WRITE
How LSM-tree is done: Memtables and SSTables
Commit log – all data is written to the
commit log for durability.
SSTables are immutable. A row is typically stored across multiple
Each SSTable has a bloom filter associated with it. The
bloom filter is used to check if a requested row key exists in
the SSTable before doing any disk seeks.
Deleted data is not immediately removed from disk.
A deleted column can reappear. Tombstones.
ARCHITECTURE: NETWORK ARCHITECTURE
• All nodes – are peers
• Client specify set of Cassandra nodes and get
connected to first live node.
• Nodes are using gossip protocol.
PARTITIONING & REPLICATION: DATA PARTITIONING
Partitioner – determines, where first replica would live in the ring.
RandomPartitioner – default strategy, provides ±same load of all
ByteOrderedPartitioner - orders rows lexically by key bytes, allows
range scans, not recommended.
PARTITIONING & REPLICATION: REPLICATION
Replication = replication factor
+ replica placement strategy
Replica placement strategy:
you have information
about network map
of your nodes;
DATA MANAGEMENT: DATA ACCESSING
READ + WRITES:
• Tunable consistency. Consistency level specify
how many nodes should answer for read/write
request(but writes goes to all replicas).
• Batches - sets a global consistency level and
client-supplied timestamp for all columns
written by the statements in the batch.
DATA MANAGEMENT: ACID
• Atomicity – writes are atomic at row level.
• Consistency – tunable consistency.
• Isolation – writes are invisible until they are
• Durability – writes are durable.
• Read-repair, anti-entropy node repair, hinted
DATA MODEL: CASSANDRA`S DATA MODEL
Cassandra`s data model
Relational databases – you design
schema, based on entities and
Cassandra – you design schema, based
on what queries you would like to
DATA MODEL: INDEXES
An index is a data structure that allows for
fast, efficient lookup of data matching a given
Primary key – the unique key used to identify
each row in a table.
Secondary indexes – refer to indexes on
DATA MODEL: CQL3
cqlsh> INSERT INTO users
VALUES ('jsmith', 'ch@ngem3a');
cqlsh> SELECT * FROM users WHERE
user_name | password | state
-----------+-----------+------jsmith | ch@ngem3a | null