Cassandra
A Decentralized Structured Storage System
Avinash Lakshman Prashant Malik
Facebook Facebook
Presented by Sameera Nelson
Outline …
 Introduction
 Data Model
 System Architecture
 Bootstrapping & Scaling
 Local Persistence
 Conclusion
What is Cassandra ?
 Distributed Storage System
 Manages Structured Data
 Highly available , No SPoF
 Not a Relational Data Model
 Handle high write throughput
◦ No impact on read efficiency
Motivation
 Operational Requirements in Facebook
◦ Performance
◦ Reliability/ Dealing with Failures
◦ Efficiency
◦ Continues Growth
 Application
◦ Inbox Search Problem, Facebook
Related Work
 Google File System
◦ Distributed FS, Single master/Slave
 Ficus/ Coda
◦ Distributed FS
 Farsite
◦ Distributed FS, No centralized server
 Bayou
◦ Distributed Relational DB System
 Dynamo
◦ Distributed Storage system
Data Model
Data Model
Figure from Eben Hewitt’s slides.
• Table
• Multidimensional map indexed by key
• Columns
• Grouped in to Column Families
• Simple
• Super (Nested Column Families)
• Column has
• Name/ Value/ Timestamp
Data Model
Supported Operations
 insert(table; key; rowMutation)
 get(table; key; columnName)
 delete(table; key; columnName)
Query Language
CREATE TABLE users
( user_id int PRIMARY KEY,
fname text,
lname text );
INSERT INTO users
(user_id, fname, lname) VALUES (1745, 'john',
'smith');
SELECT * FROM users;
System
Architecture
Fully Distributed …
 No Single Point of Failure
Cassandra Architecture
 Partitioning
 Data distribution across nodes
 Replication
 Data duplication across nodes
 Cluster Membership
 Node management in cluster
 adding/ deleting
Partitioning
 The Token Ring
Partitioning
 Partitions using Consistent hashing
Partitioning
 Assignment in to the relevant partition
Replication
 Based on configured replication factor
Replication
 Different Replication Policies
◦ Rack Unaware
 Replicate at N-1 nodes
◦ Rack Aware
 Zookeeper, using a leader
◦ Data center Aware
 similar to Rack Aware, leader chosen at Datacenter
level.
Cluster Membership
 Based on scuttlebutt
 Efficient Gossip based mechanism
 Inspired for real life rumor spreading.
 Anti Entropy protocol
◦ Repair replicated data by comparing &
reconciling differences
Cluster Membership
Gossip Based
Cluster Membership
 Failure Detection
◦ Accrual Failure Detector
If a node is faulty, the suspicion level increases.
Φ(t)  k as t  k
k - threshold variable
◦ If node is correct
Φ(t) = 0
Bootstrapping
& Scaling
Bootstrapping & Scaling
 Bootstrapping
◦ Node selects random token
◦ Locally persisted, gossiped to cluster
 Scaling
◦ Cassandra bootstrap algorithm initiated by
operator
◦ New node get a spitted range of heavily
loaded node
Local
Persistence
Local Persistence
 Write Operation
Local Persistence
 Write Operation
◦ Flush to disk after threshold
◦ Sequential Entries, Index per each
◦ Data file merging
◦ Rolling Commit logs
Local Persistence
 Read Operation
◦ Indexes all data on primary key
◦ Maintain column indices
Rea
d
Data
Conclusion
Conclusion
 Proven high scalability, performance, and
wide applicability
 Very high update throughput, delivering low
latency
 Future work
◦ Adding compression
◦ Support atomicity across keys
◦ Secondary index support
Thank You

Cassandra - Research Paper Overview