NoSQL Cassandra
July 3, 2014
Prashanth M S
NoSQL
July 3, 2014 2
Why NoSQL?
Increase in data led to use of cluster of small machines for handling it
(Scale out), but RDBMS are not designed to run on clusters
Big Table from Google and Dynamo from Amazon – were the
alternatives for data storage in the early 2000s
Common characteristics of NoSQL DBs are
◦ Not using relational model
◦ Running well on clusters
◦ Schemaless, Open-source and built for 21st century web estates
July 3, 2014 3
Types of NoSQL DBs
NoSQL Types
Aggregate
Oriented DBs
Key Value
Data Model
Amazon
DynamoDB
Document
Model
MongoDB
CouchDB
Column
Family Model
Cassandra
HBase
Graph DBs
Neo4J
Infinite Graph
July 3, 2014 4
Cassandra Data Model
The table below shows analogy in terms of relational model
Cassandra column family can be thought as map of map
◦ Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
July 3, 2014 5
Relational Model Cassandra Model
Database Keyspace
Table Column Family
Primary Key Row Key
Cassandra Key Components
Gossip
◦ Peer-to-peer communication protocol between nodes of cluster
Partitioner
◦ Determines how to distribute data across nodes of cluster
Replication Strategy
◦ For data replication
Snitch
◦ For network topology
Cassandra.yaml
◦ Timeout settings, tuning properties, etc
July 3, 2014 6
Cassandra Storage
The memtable data is flushed to SSTables on disk. Data in the commit
log is purged after its corresponding data in the memtable is flushed to
the SSTable.
July 3, 2014 7
Cassandra Data Partitioning
Lets say, we have following data
Data is placed on each node based on Partition Key and the range the
node is responsible for
July 3, 2014 8
jim age: 36 car: camaro gender: M
carol age: 37 car: bmw gender: F
johnny age: 12 gender: M
suzy age: 10 gender: F
Node Start Range End Range Partition
Key
Hash Value
A -9223372036854 -4611686018427 johnny -6723372854875
B -4611686018427 -1 jim -2245462676723
C 0 4611686018427 suzy 1168604627387
D 4611686018427 9223372036854 carol 7723358927203
Cassandra Data Distribution
using Vnodes
Vnodes allow each node to own a large number of small partition
ranges distributed throughout the cluster
July 3, 2014 9
Q & A
July 3, 2014 10

No SQL Cassandra

  • 1.
    NoSQL Cassandra July 3,2014 Prashanth M S
  • 2.
  • 3.
    Why NoSQL? Increase indata led to use of cluster of small machines for handling it (Scale out), but RDBMS are not designed to run on clusters Big Table from Google and Dynamo from Amazon – were the alternatives for data storage in the early 2000s Common characteristics of NoSQL DBs are ◦ Not using relational model ◦ Running well on clusters ◦ Schemaless, Open-source and built for 21st century web estates July 3, 2014 3
  • 4.
    Types of NoSQLDBs NoSQL Types Aggregate Oriented DBs Key Value Data Model Amazon DynamoDB Document Model MongoDB CouchDB Column Family Model Cassandra HBase Graph DBs Neo4J Infinite Graph July 3, 2014 4
  • 5.
    Cassandra Data Model Thetable below shows analogy in terms of relational model Cassandra column family can be thought as map of map ◦ Map<RowKey, SortedMap<ColumnKey, ColumnValue>> July 3, 2014 5 Relational Model Cassandra Model Database Keyspace Table Column Family Primary Key Row Key
  • 6.
    Cassandra Key Components Gossip ◦Peer-to-peer communication protocol between nodes of cluster Partitioner ◦ Determines how to distribute data across nodes of cluster Replication Strategy ◦ For data replication Snitch ◦ For network topology Cassandra.yaml ◦ Timeout settings, tuning properties, etc July 3, 2014 6
  • 7.
    Cassandra Storage The memtabledata is flushed to SSTables on disk. Data in the commit log is purged after its corresponding data in the memtable is flushed to the SSTable. July 3, 2014 7
  • 8.
    Cassandra Data Partitioning Letssay, we have following data Data is placed on each node based on Partition Key and the range the node is responsible for July 3, 2014 8 jim age: 36 car: camaro gender: M carol age: 37 car: bmw gender: F johnny age: 12 gender: M suzy age: 10 gender: F Node Start Range End Range Partition Key Hash Value A -9223372036854 -4611686018427 johnny -6723372854875 B -4611686018427 -1 jim -2245462676723 C 0 4611686018427 suzy 1168604627387 D 4611686018427 9223372036854 carol 7723358927203
  • 9.
    Cassandra Data Distribution usingVnodes Vnodes allow each node to own a large number of small partition ranges distributed throughout the cluster July 3, 2014 9
  • 10.
    Q & A July3, 2014 10