Why Cassandra?

tyfs.rocks
tyfs.rocks 126.07.2017
tayfun.sevimli

The History of Cassandra

Where is Cassandra?

Cassandra Architecture – CAP Theorem
Cassandra was designed to fall in the “AP” intersection of
the CAP theorem that states that any distributed system can
only guarantee two of the following capabilities at same time;
Consistency, Availability and Partition Tolerance. In this way
Cassandra is a best fit for a solution seeking a distributed
database that brings high availability to a system and is also very
tolerant to partition to its data when some node in the cluster is
offline, which is common in distributed systems.

Cassandra Architecture – Data Model
Cassandra is classified as a column based database, which means that its
basic structure to store data is based upon a set of columns, which are
comprised, by a pair of column key and column value. Every row is identified
by a unique key, a string without a size limit, called partition key. Each set of
columns are called column families, similar to a relational database table.

Cassandra Architecture – Data Model
SortedMap<RowKey,SortedMap<ColumnKey, ColumnValue>>
 A map gives efficient key lookup, and the sorted nature gives efficient scans. In Cassandra, we can use row keys and column
keys to do efficient lookups and range scans.
 The number of column keys is unbounded. This means, you can have wide rows.
 A key can itself hold a value, meaning In other words, you can have a valueless column.

Cassandra Architecture – Write Path
Cassandra Write Path
 Every node first writes the mutation to the commit log
and then writes the mutation to the memtable.
 Writing to the commit log ensures durability of the write
as the memtable is an in-memory structure and is only
written to disk when the memtable is flushed to disk. A
memtable is flushed to disk when:
• It reaches its maximum allocated size in memory
• The number of minutes a memtable can stay in
memory elapses.
• Manually flushed by a user
 A memtable is flushed to an immutable structure called
and SSTable (Sorted String Table). The commit log is used
for playback purposes in case data from the memtable is
lost due to node failure.
 Every SSTable creates three files on disk which include a
bloom filter, a key index and a data file.

Cassandra Architecture – Read Path
Cassandra Read Path
 Every Column Family stores data in a number of
SSTables. Thus Data for a particular row can be located in
a number of SSTables and the memtable. Thus for every
read request Cassandra needs to read data from all
applicable SSTables ( all SSTables for a column family)
and scan the memtable for applicable data fragments.
This data is then merged and returned to the
coordinator.
 If the contacted replicas has a different version of the
data the coordinator returns the latest version to the
client and issues a read repair command to the
node/nodes with the older version of the data. The read
repair operation pushes the newer version of the data to
nodes with the older version.

Cassandra Architecture – Cluster Topology
Cluster Concepts
 a node is a cassandra instance (in
production: one node per machine)
 a partition is one ordered and replicable
unit of data on a node
 a rack is a logical set of nodes
 a Data Center is a logical set or racks
 Cluster is the full set of nodes which
map to a single complete token ring
 peer-to-peer communication gossip
protocol

Cassandra Architecture – Data Consistency
tyfs.rocks 1026.07.2017
Tunable Data Consistency
How many nodes must acknowledge a
read/write request
 choose between STRONG to
EVENTUAL
 possible CL: ANY, ONE, QUORUM
(RF/2+1), ALL
 tunable per request support
 multi-datacenter support

Cassandra Architecture – CQL Language
tyfs.rocks 1126.07.2017
Cassandra Query Language
 very similar to RDBMS SQL syntax
 create objects via DDL
 core DML commands insert,
update, delete supported
 query data with Select commands

Cassandra Architecture – Security
tyfs.rocks 1226.07.2017
Cassandra Security Features
 Authentication based on internally
controlled rolename/passwords
 Authorization based on object
permission management
 Authentication and authorization
based on JMX
username/passwords
 SSL encryption

Why Cassandra ?
tyfs.rocks 1326.07.2017
• Scales linearly with massive write
 Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide
Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.
• Highly Fault Tolerant
 Masterless cluster with no single point of failure. In simple terms, your users will never know if a server, an entire rack
of servers, or even if an entire data center fails. There is also the potential for zero downtime rolling upgrades.
• Easy Replication / Data Distribution
• Homogenous Environment
 No master-slave or sharding setup and that all nodes in the ring are equal.
• Ease of Administration
 Masterless, fault-tolerant, supports temporary loss of nodes with minimal impact to production performance.
• Wide Community
 No master-slave or sharding setup and that all nodes in the ring are equal.

Use Cases of Cassandra
tyfs.rocks 1426.07.2017
• Messaging & Event Sourcing
 Cassandra is a great database which can handle a big amount of data. So it is preferred for the companies that provide
Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.
• IoT & High Speed Applications
 Cassandra can handle the high speed data so it is a great database for the applications where data is coming at very
high speed from different devices or sensors.
• Product Catalogs and Retail Apps
 Cassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output.
• Social Media Analytics & Recommendations
 Cassandra is a great database for many online companies and social media providers for analysis and
recommendation to their customers.

Cassandra for Akka Persistence
tyfs.rocks 1526.07.2017
• Linear scalability
 Expected Massive Load
• No SPOF
 Fault-tolerant, Resilient
• Always-On Multi-Data Center
 Data Distribution & Replication
 Cluster over Multi-Data Centers
• AKKA Persistence
 CQRS with Event-Sourcing
 Akka’s supported up to date plugin
(Lightbend)
• Akka Streams
 Batch Processing over Streaming

Cassandra Benchmarks
tyfs.rocks 1626.07.2017
University of TORONTO, NoSQL Database Performance Benchmarks, 2012
Write latency for workload read/write
Throughput for workload read/scan/write
Read latency for workload read/write
Throughput for workload read/write

tyfs.rocks 1726.07.2017
Netflix, Benchmarking Cassandra Scalability on AWS, 2011

tyfs.rocks 1826.07.2017
EndPoint database and open source consulting company, 2014

tyfs.rocks 1926.07.2017
EndPoint database and open source consulting company, 2014

Resources
tyfs.rocks 2026.07.2017
• Apache Cassandra Web Site
• Planet Cassandra Community
• DataStax Web Site
• The Distributed Architecture Behind Apache Cassandra, Bruno TINOCO
• Introduction to Apache Cassandra's Architecture, Akhil Mehra
• An Overview of Apache Cassandra, DataStax
• NoSQL Performance Benchmarks, DataStax
• Top 10 Reasons to Use Cassandra, Michael COLBY
• Security in Cassandra, IBM Developer Works

Why Cassandra?

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Why Cassandra?

Similar to Why Cassandra? (20)

Recently uploaded

Recently uploaded (20)

Why Cassandra?

Editor's Notes