NoSQL Session II

Agenda
• Session I recap
- Why NoSQL/ Drawback of Relational DB
- Common Characteristics
- Storage Mechanism
- CAP Theorem & Advantages
• Data stax Apache Cassandra Installation
• Cassandra Concepts

Features of Cassandra
• Column based storage mechanism
• High Availability
• High Scalability/ Horizontal scaling
• Predictable performance
• No SPOF – Single point of failure
• Multi DC – Data Center/ Multi region availability
• Commodity Hardware
• Easy to manage operationally

• Node – One Cassandra instance
• Rack – A Logical set of Nodes
• Data Center – A Logical set of Racks
• Cluster - The full set of nodes which map to a
single complete token ring

CQL
• CREATE KEYSPACE “KeySpace Name” WITH
replication = {'class': ‘Strategy name’,
'replication_factor' : ‘No.Of replicas’}
• CREATE TABLE tablename( column1 name
datatype PRIMARYKEY, column2 name data
type, column3 name data type, PRIMARY KEY
(column1) )

Strategy name Description
Simple Strategy' Specifies a simple replication
factor for the cluster.
Network Topology
Strategy
Using this option, you can set
the replication factor for each
data-center independently.
The replication option is to specify the Replica
Placement strategy and the number of replicas
wanted. The following table lists all the replica
placement strategies.

CONSISTENCY
• Consistency levels are available for Read and
Write Operations.
• ANY, ALL, QUORUM([RF/2]+1), EACH, etc
• High Consistency – Low Availability
• Low Consistency – High Availability

SEED & CO-ORDINATOR NODE
• Seeds and Coordinators serve different purposes.
• Seed nodes: In general it is recommended to have 2 seeds for
the whole cluster. If you have multi-datacenter cluster then
you may want to distribute the seeds across each datacenter.
• Coordinator nodes: Every node can be a coordinator (as
designed by Cassandra). Coordinator is picked by Cassandra
per request and the only thing you can change is how it is
picked - for example Round-Robin (default) or DC-aware,
LatencyAware. This is found in the cassandra.yaml file.

• Maximum columns per row is 2 billion, but in
practical it is about 10 to 20 thousand max
used
• Maximum data size per cell (Column value) is
2 GB, but in practical it is about 10MB used.

CLUSTER TOPOLOGY
• Cluster communicated - SNITCH and Gossip
• Hinted- Handoff
• Write path
• Read path
• Read Repair
• Configuration – Cassandra.yaml file

SNITCHES & GOSSIP
• Snitch - Cassandra does its best not to have
more than one replica on the same rack to
avoid duplicate
• Determines the location of nodes by rack and
data center corresponding to the IP addresses
• Gossip – Once per second each node gossip’s
each other to update themselves

• Hinted Hand Off – A recovery mechanism for
writes targeting offline nodes
• Grace time can be maintained yaml file
• Property – max_hint_window_in_ms : 1000
• hinted_handoff enable: true

SSTable – Static & Sorted Table
• Immutable data file for row storage
• Partition is spread across multiple SS Table
based on timestamp
• Easy Backup – Delete is marked as
“TombStones”

Read Path
• Read Repair – When any node is stale it is
marked as read-repair
• Property – read_repair_chance

Thank You !!!
Continue in Next session

NoSQL Session II

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to NoSQL Session II

Similar to NoSQL Session II (20)

NoSQL Session II