Practical CassandraNoSQL key-value vs RDBMS – why and whenCassandra architectureCassandra data modelLife without joins or HDD space is cheap todayHardware requirements & deployment hints Vitalii Tymchyshyn email@example.com @tivv00
RDBMS problemsSometimes you reach the point where single server cant copeRelational Replication Not write scalable Data is not instantly visibleSharding No foreign keys or joins No transactions Reduced reliability (multiple servers)Schema update is a pain
Cassandra NoSQLMaster-Master Replication + Sharding in one bottlePeer-to-peer architecture (no SPOF)Easy cluster reconfigurationEventual consistency as a standardAll data in one record – no need to joinFlexible schema
Our dataWe have intelligent Internet cacheIntelligent means we dont cache everything or we would need Googles DCIts still hundreds of millions of sitesAnd 10s of TB of packed dataRandomly updatedAnalysis must be able to process all of this in term of hours
Ring partitioner types Order Preserving RandomEach server serves Data is smoothly key range distributed onRange queries servers possible No range queriesRead/Write/Disk No hot spots space hot spots Fixed key range possibleComplex to fix key range
Runtime CAP-solvingThe whole thing is about replicationCAP: Consistency, Availability, Partition tolerance – choose two.With cassandra you can choose at runtime.
Runtime CAP-solvingQuorum read/write Fast writes Fast reads Fast, less consistency
Data modelKeyspaces – much like database in RDBMSColumn Families – storage element, like tables in RDBMSColumns – you can have million for a row, names are flexible, still like columns in RDBMSSuper Column – A column that has structured content, superseded by composite columns
Example Twitter DB Twitter Keyspace Users table Users CFID, Name, Birthday Key: User ID Name(Str), Birthday(Str)Tweets table Timeline CFUserID, TweetID, Key: User ID TweetContent <TweetID>(TweetContent)
Example (alternative) Twitter DB Twitter Keyspace Users tableID, Name, Birthday Data CF Key: User ID Name(Str), Birthday(Str),Tweets table <TweetID>(TweetContent)UserID, TweetID, TweetContent
Example (data)Users Tweets User ID Text ID Name 1 1 Hello 1 Tom 1 2 See me? 2 John 2 3 See you!DataKey Data1 Name = Tom T_1 = Hello T_2 = See me?2 Name = John T_3 = See you!
Data modelYou can have same key in multiple column familiesYou can have different set of columns for different keys in same column familyYou can query a range of columns for a key (columns are sorted) with paginationYou can have (and its useful) to have columns without values
ACID vs BASESuper Heroes are good, but not scalable. So, what do we loose?
No AtomicityYouve got no transactions – no rollbackThe maximum you have is atomic update to single rowFailed operation MAY be applied (thats why counters are not reliable)
Eventual ConsistencyCassandra has no central governorThis means no bottleneckThis also means no one knows if database as a whole is consistentRegular repair is your friend!
No IsolationAll mutations are timestamped to restore order from chaotic arrivalYou MUST have your clock synchronized Thats how operation are applied on server :)
Controlled DurabilityCassandra uses transaction log to ensure durability on single serverDurability of the whole database depends on both total number of replicas and write operation replication factorRemember, single server 99% uptime 100 means 36.6% (0.99 ) of “full cluster working” uptime for 100 servers – most time youve got at least one server down!
Data queryingWith SQL you simply ask.You can easily scan the whole DBIndexes may helpAny calculation is repeated each timeThis can be slow on read
Data queryingWith NoSQL you cant efficiently scan the whole dbNo “group by” or “order by”You must prepare your data beforehandYou have multiple copies of dataYou must recalculate on application logic changeThe precalculated reads are fast
Think on your queries in advance!There is no “Ill simply add an index, some hints and my query will become fast”Any index is created and maintained from application codeNow cassandra have secondary indexes, but they are much inferior to custom ones
Whats wrong with secondary indexesThey work on fixed column namesThey are consistent with dataThis means they live near the data they indexThis means they are distributed between nodes by row key, not by indexed column valueThis means you need to ask every node to get single value
Peer-to-peer replicationYour operation can return OK even if it was not written to every replicaHinted handoff will try to repair laterEven if your operation have failed, it may have been written to some replicasThis inconsistency wont be repaired automaticallyThis are drawbacks of “no master” architectureYou need to repair regular!
Tombstones and Repair Delete events are recorded as Tombstones to ensure arriving “before delete” data wont be usedRegular repair not only makes sureyour data is replicated, but alsothat your deletes are replicated.If you dont, beware of ghosts!
Resources & EnvironmentDisk space requirementsMemory requirementsNative plugins & configuration
Disk estimationsSay, weve got 1TB of dataReplication factor 3 make it 3TBData duplication make it 12TBTombstones/repair space make it 24TBBackups make it 36TB
Memory estimationsCassandra has certain in-memory structures that are linear to data amountKey and Row caches – configured at column family level. Change defaults if youve got a lot of CFsBloom filters and key samples cache are configured globally in latest versionsEstimate minimum ~0.5% of RAM for your data amount
Native specificsCassandra (like may other large things) likes JNA. Please install.Cassandra maps files to memory – cassandra process virtual and resident memory size will grow because of mmap.Default heap sizes are large – tame it if its not only task on the host