My INSURER PTE LTD - Insurtech Innovation Award 2024
Cassandra database design best practises
1. Key points to look at while designing:
1. Distributed No shared storage
No Single Point of Failure SPOF
2. Linear Scalability
3. ACID Vs CAP Theorem
4. DB write speed more than read speed
5. Mitigation for sequence joins and
aggregation by Hive for speed spark.
6. Polyglot of persistence
2. • Expiring Columns with TTL
• Create row key for speed & use wide rows for
efficiency
• Row level isolation (Rowkey wide rows)
• Atomic batches
(use timestamp to present order SSTable )
• Log structured Storage: Regular compaction
Commit log (durability) memtable (in
memory){flush when full} SSTable
(Compaction)
3. • Tunable data consistency by consistency
strategy {Any|One|Quorum|Local
Quorum|Each Quorum}
Quorum (replication_factor+1)/2
• Super Column limitation: require
deseriailize all column for processing
• Counter Column: incrementally counts
occurrence of particular events
4. • Study Query denormalize by Access
pattern identify repeating columns as
keys. (depends on requirement)
• One to Many relationship Modelling E.g.
RDBMS: User table Video Table
C* : Break to 2 Column Families CF
CF1: Video(key videoid UUID)
CF2: Username_video_index
(key(username,videoid)
5. Many to Many Relationship modeling
Example:
Tables: Student m:n Teacher
CF1: Student_with_Teacherid
CF2: Teacher_with_Studentid
Model depends on Query pattern
6. • Look serialization cost of Collection
(Set|Map|List) and limit 64K.
• TimeSeries : CF ordered by
Timestamp+access pattern.
• Partial Word Indexes:
create partition + partial indexes
• Bit Map index: Example
Vehicle{make,model,color,vech_id,lot_id}
primary key {make,model,color,vech_id}
Row Key {make,model,color}
7. • Index interval:
smaller index faster seek time
properties:
index_interval keep optimum
• Immediate Consistency:
Nodes_written+nodes_read >
replication_factor
• Do not used index on high cardinality
columns,counter columns,frequently
updated columns.