Be the first to like this
At Intuit Data Engineering and Analytics, we work on multiple products and offerings from Profile Store to deeply personalized A/B testing platforms.
In this session, I will touch upon cassandra usage at Intuit on personalized A/B testing platform, concerns we faced, and the learnings we had. We hope that this sharing of both issues, and what we learned helps to mitigate problems from more cassandra users end, and prevent them. We created a scalable, highly available, responsive personalized A/B testing platform on AWS and Cassandra as our NoSQL backend.
Constant Long Garbage Collection
Repair takes a long time.
Potential Data Loss post decommissioning nodes
Nodetool decommission logs error silently
Opscenter had performance impact on production
Strange status - /etc/init.d/dse status showed running, but cqlsh would not start.
SSTableLoader does not work to stream sstables with internode-encryption enabled.
Track the replication factor is correctly set
Check the read and write quorums, if set differently on different modules for expected behavior.
It is a denormalized structure so create tables judiciously
Index date time field if select will require where clauses on datetime.
Always do heap, garbage, thread monitoring for cassandra.
Always take current snapshots before attempting a restacking.
Have a data recovery strategy, regular snapshots moved to S3.
Configure cassandra yaml/cassandra-env.sh correctly for GC/heap_dump
Understand the capabilities of nodetool cfstats/tpstats/compactionstats/netstats
Understanding compaction, tombstones
DSE 4.7 has good data migration capabilities, and faster repair times.
With the support from Datastax, we were able to have a great tax season and serve our users.Still some few puzzling pieces that we are working with Datastax on. We hope this sharing will help other Cassandra users to use it most effectively!!