Cassandra Jahangir Mohammed firstname.lastname@example.org
What is Cassandra? Distributed data store O(1) DHT Column-oriented Dynamo + Big Table
Why not RDBMS? Many-to-many relationships -> Joins -> Denormalization -> Multiple copies of data or redundancy Rigid schema Vertical scaling is easier than horizontal ACID, Distributed transaction, Two-phase commit Slower writes
CAP Theorem Consistency – All clients will read the same data at same time. Availability – Service always up and running. Partition tolerance – System on whole operates despite network issues.
Features Proven Rich data model Scalable Distributed & Decentralized Cross datacenter support High performance writes/reads No SPOF Schema free Tunable consistency
Limitations No ACID transactions(if needed) Eventually consistent(Tunable consistency, trade-off with performance)
ARCHITECTURE Ring Each node – unique token Tokens range from 0 to 2**127 Keys MD5 hash to determine node
RING h(key1) 0 1 N=3 B h(key2) A C F E D 1/2 9
ARCHITECTURE P2P: All nodes are identical No “master” node Gossip: Protocol for intra-ring communication Each node have state information about other nodes Anti-entropy & Read Repair: Replica synchronization mechanism Occurs during major compaction Uses Merkle trees
READ REPAIR Client Result Query Cassandra Cluster Read repair if digests differ Closest replica Result Replica A Digest Query Digest Response Digest Response Replica B Replica C
WRITE PATH Commit log: Responsible for all writes Memtable: In-memory data structure, written after commit log. SSTable: Immutable table Memtable flushed to disk
WRITE PATH Key (CF1 , CF2 , CF3)
Number of Objects
Memtable ( CF1) Commit Log Binary serialized Key ( CF1 , CF2 , CF3 ) Memtable ( CF2) FLUSH Memtable ( CF2) Data file on disk <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> --- --- --- --- <Key name><Size of key Data><Index of columns/supercolumns>< Serialized column family> Dedicated Disk
ARCHITECTURE Bloom filter: Performance booster Fast, nondeterministic algorithms In memory Used during read operation Tombstones: Deletion marker Soft delete Marker older than a set time, GC’ed
HINTED HANDOFF & COMPACTION Hinted Handoff: Node responsible down Coordinator creates hint Compaction: Merge SSTables. Keys merged Columns combined Tombstones discarded New index created
PARTITIONER Decides where row key(data) finds place in ring. Random Partitioner: MD5 hash Spreads keys evenly Inefficient range queries Order-Preserving Partitioner: Rows sorted
DATA MODEL Keyspace: Like Database. Container for CFs. Column Family: Like Table(But, not exactly a relational database table). Container of rows. Row: Sorted collection of columns. Column: Basic unit of data structure. Triplet of name, value and timestamp
DATA MODEL Super Column: Special column. Sorted associative array of columns. Map of maps. Only one level deep. Super Column Family: Container of rows having super columns. 4-D DHT = Standard CF: [Keyspace][ColumnFamily][Key][Column]. 5-D DHT = Super CF: [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn].
REPLICATION & CONSISTENCY Replication: No. of copies of data in the system. Consistency level: No. of replicas to respond.
REPLICA PLACEMENT STRATEGY Simple Strategy: Rack-Unaware Fast Single D.C.