This document provides an overview and introduction to Apache Cassandra, including:
- Cassandra is a distributed database designed to handle large amounts of structured data across commodity servers. It provides high availability with no single point of failure and linear scalability.
- Cassandra uses a ring topology and consistent hashing to distribute data evenly across nodes. Data is stored in tables with rows mapped to partitions that are replicated across the ring for fault tolerance.
- The write path involves writing to memtables, committing to the commit log for durability, flushing memtables to SSTables, and periodic compaction. The read path merges data from memtables, SSTables, commit log and caches for fast retrieval.