Appache Cassandra

Agenda
• Introduction
• Where Did Cassandra Come From?
• Why Cassandra?
• Data Model
• CQL (Cassandra Query Language)
• Who Uses Cassandra?
• MySQL Comparision
• Strengths
• Weaknesses

Introduction
Open source distributed database
management system for handling
huge amounts of data across
many commodity systems.
Cassandra is a “NoSQL” or “Non-
Relational” database and can be
described as:
 Scalable, fault-tolerant, and
consistent.
 A column-oriented database.

Where Did Cassandra Come From?
•Cassandra was initially created at Facebook.
•Combination of Google Big Table and Amazon
Dynamo.
•It was created to power the “Inbox Search”
feature.
•Cassandra was released as open source in
July of 2008.
•It became an Apache Incubator project in
February of 2009 and It became a full level
project a year after that.

Why Cassandra?
Gigabyte to Petabyte scalability
No single point of failure
Data distribution & Decentralized
Data Relication
High performance
Elastic scalability
Fault tolerant
Flexible schema design
Data Compression
CQL language (like SQL)
No need for special hardware or software

Distributed &
Decentralized
● Distributed: Capable of
running on multiple machines
● Decentralized: No single point
of failure
● No master-slave issues due to
peer-to-peer architecture
(protocol "gossip")
Read- and write-requests
to any node
6

Elastic
Scalability
● Cassandra scales horizontally,
adding more machines.
● Addition of nodes increase
performance throughput
linearly.
● Decreasing and increasing the
node count happen seamlessly.
Linearly scales toterabytes
and petabytes of data
7

High Availability &
Fault Tolerance
● Multiple networked computers
operating in a cluster.
● Cassandra uses the Gossip
Protocol for recognizing node
failures.
● Forward failing over requests
to another part of the system.
due to the peer-to-peer
architecture
8

Data
Replication
● In Cassandra, one or more of the
nodes in a cluster act as replicas
for a given piece of data.
due to replicated data
6
1
2
3
4
5
9

Components of
Cassandra?
The key components of Cassandra are as follows −
Node − It is the place where data is stored.
Data center − It is a collection of related nodes.
Cluster − is a component that contains one or more data centers.
Commit log − is a crash-recovery mechanism in Cassandra. Every write
operation is written to the commit log.
Mem-table − is a memory-resident data structure. After commit log, the
data will be written to the mem-table.
SSTable − It is a disk file to which the data is flushed from the mem-table
when its contents reach a threshold value.
Bloom filter − It is a special kind of cache. Bloom filters are accessed
after every query.

Data Model
• Cluster:Cassandra database is distributed over
several machines that operate together. The
outermost container is known as the Cluster.
• Keyspace:Keyspace is the outermost container
for data.
• Column Families: Represent the structure of
data. Each keyspace has at least one and often
many column families.
• Two types of Column Families
– Simple
– Super (nested Column Families)
• Column:Is the basic data structure of Cassandra
with three values.
• Each Column has
– Key
– Value
– Timestamp

Simple column
family
Super Column
family

CQL (Cassandra Query Language)
• CQL is very similar to S Q L (Structured Query
Language) in terms of syntax and commands.
• CQL treats the database (Keyspace) as a container of
tables. All statements end with a semi-colon.
• cqlsh: a prompt to work with CQL or separate
application language drivers.Using cqlsh, we can:
• define a schema,
• insert data, and
• execute a query.
Relational
Model
Cassandra
Model
Database Keyspace
Table Column Family
(CF)
Primary key Row key
Column name Column
name/key
Column value Column value

Examples Using CQL
The Following Slides will
demonstrate different cases with
different CQL interfaces like DDL,
DML etc..
User
• Id
• Name
• Phone
• Age
Emails
• Id
• email

• Type
• Keyspace ,Table
• Index , Trigger
DROP
• Type
• Keyspace ,Table
• Index , Trigger
CREATE
• Type
• Keyspace ,Table
• Index ,Trigger
ALTER
CREATE KEYSPACE - Creates a KeySpace in Cassandra.
USE - Connects to a created KeySpace.
ALTER KEYSPACE - Changes the properties of a KeySpace.
DROP KEYSPACE - Removes a KeySpace
CREATE TABLE - Creates a table in a KeySpace.
ALTER TABLE - Modifies the column properties of a table.
DROP TABLE - Removes a table.
TRUNCATE - Removes all the data from a table.
CREATE INDEX - Defines a new index on a single column of
a table.
DROP INDEX - Deletes a named index.
Interface DDL

Interface DML
SELECT INSERT
UPDATE DELETE
DML
INSERT - Adds
columns for a row in
a table.
UPDATE - Updates a
column of a row.
DELETE - Deletes
data from a table.
BATCH - Executes
multiple DML
statements at once.

CQL Clauses
SELECT - This clause reads data from a table
WHERE - The where clause is used along with select to read a
specific data.
ORDERBY - The orderby clause is used along with select to read a
specific data in a specific order.

Who Uses Cassandra?
• Facebook
• WalmartLabs
• Constant
Contact
• Digg
• AppScale
• Netflix
• Twitter
• Zoho
• IBM
• FormSpring
• Cisco
WebEx
• Rackspace
• OpenX
• Adobe
• Comcast
• eBay

MySQL Comparision
Cassandra MySQL
Average Write 0.12 ms ~300 ms
Average Read 15 ms ~350 ms
Statistics based on 50 GB Data
Stats provided by Authors using Facebook data.

● Flexible data model
Supports modern data types with fast writes and reads.
● Peer to peerarchitecture
Cassandra follows a peer-to-peer architecture, instead of
master-slave architecture.
● Schema-free/Schema-less
In Cassandra, columns can be created at your will within the rows.
Cassandra data model is also famously known as a schema-optional
data model.
● AP-CAP
Cassandra is typically classified as an AP system, meaning that
availability and partition tolerance are generally considered to be
more important than consistency in Cassandra.
Strengths

Strengths
● Linear scale performance
The ability to add nodes without failures leads to predictable increases In
performance.
Supports multiplelanguages
Python, C#/.NET, C++, Ruby, Java, Go, and many more…
● Operational and developmental simplicity
There are no complex software tiers to be managed, so administration
duties are greatly simplified.
● Ability to deploy across data centers
Cassandra can be deployed across multiple, geographically dispersed data
centers.
● Cloud availability
Installations in cloud environments.

Weaknesses
Use Cases where it is better to avoid using Cassandra
● If there are too many joins required to retrieve the data.
● To store configuration data.
● During compaction, things slow down and throughput
degrades.
● Basic things like aggregation operators are not
supported.
● Range queries on partition key are not supported.
● If there are transactional data which require 100%
consistency.
● Cassandra can update and delete data but it is not
designed to do so.

Appache Cassandra

More Related Content

What's hot

Similar to Appache Cassandra

More from nehabsairam

Recently uploaded

Appache Cassandra