Why talk about Cassandra● Audience Platform● We use Hadoop and HBase for batch processing● We need a real time database for online random access. – HBase – MySQL – MongoDB – Cassandra
Our needs● Provide an interface which supports random access.● Periodically synchronize data from HBase.● Eliminate single point failure.● Store data in multiple data centers for data locality and disaster recovery.
Our selections● Hadoop and HBase for processing prediction data in batch.● Cassandra for real time query.
Apache Cassandra● Inspired by Amazon Dynamo and Google BigTable● Originally developed by Facebook● Apache top level project● Used by Twitter, Netflix, rackspace, ...
What defines Cassandra● High availability● Eventually consistent● Decentralized cluster● BigTable-like data model● High write throughput
Why we choose Cassandra● It has great write performance, faster than read.● Decentralized architecture, no single point of failure.● Managing a Cassandra cluster is simple.● Replication configurations are flexible, supporting cluster across multiple data centers.● Occasionally inconsistency can be tolerated.
Cassandra vs other DBs Cassandra MongoDB HBase MySQLData BigTable-Like Document BigTable-Like Table and rowModelCAP AP CP CP CACluster P2P M-S Replication & Hadoop M-S Replication build-in ShardingOptimized Write Read Batch job ReadforQuery By key or Multi-indexed By key or scan Multi-indexed scanProtocol Thrift Client REST or Thrift Client
Data model● Cassandra has a BigTable-like data model, but not identical Keyspace Column Family 1 Column Family 2 Row 1 Row 1 Column 1 Column 2 Super Column Family Super Column 1 Super Column 2 Value 1 Value 2 Column 1 Column 2 Column 1 Column 2 Row 2 Value 1 Value 2 Value 1 Value 2 Column 1 Column 2 Value 1 Value 2 ……
Comparison with MySQLCassandra MySQLKeyspace Database (or schema)Column Family TableRow RowColumn FieldValue Value
Cluster Communication● Cassandra uses gossip protocol to discover location and state information about the other nodes in a cluster.● When a node first starts up, it finds seeds node to listen to gossips. A node will remember other nodes it has gossiped.● Failure state are automatically tracked during each heartbeat detection by gossip.
Data Partitioning● Data in the cluster is represented as a ring.● The ring is divided into ranges which equal to the number of nodes. Each node is responsible for one (or more) ranges of the overall data.● Each node has a token. The token determines the node’s position on the ring.● In configuration file, you can set initial token on each node, or generated automatically.
Data Partitioning● For example, 4 node cluster, range of 0 to 100.
Multiple Datacenters● In multiple datacenters deployments, it is ensured that each data center has a whole copy of data.
Partitioning with Replication● The total number of replicas across the cluster is referred to as replication factor. replication_factor = 3
Snitch● Snitch defines how the nodes are grouped together within network topology.● Cassandra uses this information to route inter- node requests.● The snitch does not affect requests between the client application and Cassandra. It does not control which node a client connects to.
Write in Cassandra● Consistency level: ZERO, ANY, ONE, QUORUM, ALL
Underlying Storage● Cassandra is optimized for write throughput. Writes are first written to a commit log, and then to memtable.● Writes are batched in memory and periodically written to disk to a persistent table structure called an SSTable.● A row may be stored across multiple SSTable files. At read time, a row must be combined from all SSTables on disk to produce the requested data.
Deletes● Deleted data is not immediately removed from disk.● Instead a marker called a tombstone is written to indicate the new column status.● Columns marked with a tombstone exist for a configured time period , and then are permanently deleted by the compaction process after that time has expired.
Compaction● Since SSTables are immutable, Cassandra periodically merges SSTables together using a process called compaction.● Compaction merges row fragments together, removes expired tombstones. SSTables are sorted by row key, so this merge is efficient.● During compaction, there is a temporary spike in disk space usage and disk I/O.
Limitations of Cassandra● All data for a single row stores on a single machine in the cluster. The amount of data associated with a key has this upper bound.● A single column value may not be larger than 2GB.● The maximum number of column within per row is 2000000000 (2 billion).● The key and column names must be under 64K bytes.
Other features● Secondary index● Load balancer● Column TTL● Thrift, Avro and CQL● Hadoop integration