Introduction to Cassandra

INTRODUCTION TO
APACHE CASSANDRA
Gökhan Atıl

GÖKHAN ATIL
➤ Database Administrator
➤ Oracle ACE Director (2016) 
ACE (2011)
➤ 10g/11g and R12 Oracle Certiﬁed Professional (OCP)
➤ Co-author of Expert Oracle Enterprise Manager 12c
➤ Founding Member and Vice President of TROUG
➤ Blogger (since 2008) gokhanatil.com
➤ Twitter: @gokhanatil
2

INTRODUCTION TO APACHE CASSANDRA
➤ What is Apache Cassandra? Why to use it?
➤ Cassandra Architecture
➤ Cassandra Query Language (CQL)
➤ Cassandra Data Modeling
➤ How to install and run Cassandra?
➤ Cassandra nodetool
➤ Backup and Recovery
3

WHAT IS APACHE CASSANDRA? WHY TO USE IT?
4

WHAT IS APACHE CASSANDRA? WHY TO USE IT?
➤ Fast Distributed (Column Family NoSQL) Database
High availability
Linear Scalability
High Performance
➤ Fault tolerant on Commodity Hardware
➤ Multi-Data Center Support
➤ Easy to operate
➤ Proven: CERN, Netﬂix, eBay, GitHub, Instagram, Reddit
5

HIGH AVAILABILITY: CAP THEOREM AND CASSANDRA
6
Partition
Tolerance
Availability
Consistency 
(ACID)
RDBMS
Atomicity
Consistency
Isolation
Durability

HIGH AVAILABILITY: THE RING
7
NO MASTER NO SLAVE
PEER TO
PEER
gossip
gossip
I'm online!

CASSANDRA PARTITIONS
10
EMAIL NAME PHONE
gokhan@ Gokhan 542xxxxxxx
aylin@ Aylin 532xxxxxxx
ilayda@ Ilayda 532xxxxxxx
partitionerPRIMARY KEY
PARTITION KEY, CLUSTERING KEY

REPLICATION FACTOR
11
EMAIL
gokhan@
Murmur3Partitioner
# 60

WRITE PATH (CLUSTER)
12
coordinator
node
client
hinted
hand off

WRITE PATH (NODE)
➤ Logging data in the commit log
➤ Writing data to the memtable
➤ Flushing to (immutable)
SSTables (Sorted Strings Table)
13
memtable
commit log SSTable SSTable SSTable
disk
mem
flush
compaction

READ PATH (CLUSTER)
14
coordinator
node
client
➤ Read Repair: repair during read path using digest and timestamp
data
digest
digest

READ PATH (NODE)
15
memtable row (read) cache
bloom filter 
(maybe or no)
partition key
cache
partition
summary
partition index SSTable
found
maybe
found
no
disk
mem

CONSISTENCY LEVELS
➤ Formula for Strong Consistency: R + W > N
16
ANY (write only) at least one node
ONE, TWO, THREE
at least one/two/three replica
node
QUORUM
a quorum (N/2+1) of replica
nodes across all datacenters
LOCAL_QUORUM
a quorum (N/2+1) of replica
nodes in the same datacenter
ALL on all replica nodes

CASSANDRA QUERY LANGUAGE (CQL)
17

➤ Create a Keyspace (Database): 
create keyspace demo with replication = { 'class' :
'SimpleStrategy', 'replication_factor' :1 };
➤ Remove a keyspace: 
drop keyspace demo;
➤ Select a keyspace to operate: 
use demo;
18

➤ Create a table: 
create table demo.democlients ( email text, name text,
phone text, primary key (email, name));
➤ Alter a table: 
alter table democlients add money int;
➤ Remove a table: 
drop table democlients;
➤ Remove all rows in a table: 
truncate table democlients;
19
EMAIL: PARTITION KEY
NAME: CLUSTERING KEY

➤ Retrieve rows: 
select * from democlients where name='Gokhan Atil'
ALLOW FILTERING; -- or create a secondary index
➤ Retrieve distinct values: 
select DISTINCT email from democlients;
➤ Limit the number of rows returned: 
select * from democlients LIMIT 1;
➤ Sort the result: 
select * from democlients where email='gokhan at
gokhanatil.com' ORDER by name DESC;
20
NAME: CLUSTERING KEY
EMAIL: PARTITION KEY

➤ Retrieve the results in the JSON format: 
select JSON * from democlients;
➤ Insert a row: 
insert into democlients (email, name, phone) values
('gokhan at gokhanatil.com','Gokhan Atil','542' ) IF NOT
EXISTS;
➤ Insert a row with TTL (Time to live - seconds): 
insert into democlients (email, name, phone) values ('info
at gokhanatil.com','Information','542' ) USING TTL 10;
21

➤ Update records: 
update democlients set phone='535' where
email='gokhan at gokhanatil.com' and  
name='Gokhan' IF EXISTS;
➤ Update records with a condition: 
update democlients set money=20 where email='gokhan
at gokhanatil.com' and name='Gokhan Atil'  
IF phone='542';
➤ Delete rows: 
delete from democlients where email='gokhan at
gokhanatil.com' IF EXISTS;
22

➤ Delete row with a condition: 
delete from democlients where email='gokhan at
gokhanatil.com' and name='Gokhan Atil' IF money > 10;
➤ Delete columns in a row: 
delete money from democlients where email='gokhan at
gokhanatil.com' and name='Gokhan Atil';
23

CASSANDRA DATA MODELING
➤ Query-Driven Data Modeling
➤ Spread data evenly across the cluster
➤ Use Denormalization
➤ Be careful about using secondary indexes
24

HOW TO INSTALL AND RUN CASSANDRA?
25

HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ Make sure you have JDK (8u40 or newer) installed
➤ Download apache-cassandra-VERSION-bin.tar.gz
➤ Extract the file to a folder
➤ Make data and logs directories in cassandra folder
➤ Run bin/cassandra
➤ Edit the configuration file (conf/cassandra.yaml)
➤ Give a name to cluster, change listening address, data and logs
directory locations, enable authentication and authorization.
26

HOW TO INSTALL AND RUN CASSANDRA CLUSTER?
➤ User docker to pull the latest image: 
docker pull cassandra
➤ Run it as standalone: 
docker run --name cas1 -p 9042:9042 -e
CASSANDRA_CLUSTER_NAME=MyCluster -d cassandra
➤ Connect using clqsh: 
docker exec -it cas1 cqlsh
➤ Run nodetool (i.e for check status): 
docker exec -it cas1 nodetool status
27

CASSANDRA NODETOOL
➤ Get a quick summary of the node: 
nodetool info
➤ Get version of Cassandra: 
nodetool version
29

CASSANDRA NODETOOL
➤ Get status of the cluster/keyspace: 
nodetool status <keyspace_name>
➤ View the network statistics of the node: 
nodetool netstats
➤ Get information of a table: 
nodetool cfstats <keyspace_name.table_name>
30

CASSANDRA NODETOOL
➤ Repair a node (you can run it weekly on non-peak hours): 
nodetool repair
➤ Cleanup of keys no longer belonging to a node: 
nodetool cleanup
➤ Start a major compaction process: 
nodetool compact
➤ Check the compaction process: 
nodetool compactionstats
31

CASSANDRA NODETOOL
➤ Decommission a node (to prepare to remove it): 
nodetool decommission <node_UUID>
➤ Remove a dead/or decommissioned node from the cluster: 
nodetool removenode <node_UUID>
➤ Take a snapshot (for backup): 
nodetool snapshot
➤ Remove previous snapshots: 
nodetool clearsnapshot
32

BACKUP AND RECOVERY
➤ Back up a cluster:
1. Take a snapshot of each node.
2. Move the snapshots to another storage (S3 bucket?)
3. Clean all the snapshots
➤ Restore node(s):
➤ Make sure schema exists
➤ Truncate table
➤ Copy most recent snapshots to a directory. Its name should
be formatted as "keyspace/tablename". Run: 
sstableloader -d <nodeip> keyspace/tablename
34

BUILD A BACKUP NODE
➤ Use multi-DC replication: 
CREATE KEYSPACE "MyKeyspace" 
WITH replication = {  
'class' : 'NetworkTopologyStrategy', 
'datacenter1' : 3, 'datacenter2' : 1 };
35
RF=3
client
snapshots

Blog: www.gokhanatil.com Twitter: @gokhanatil

Introduction to Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Cassandra

Similar to Introduction to Cassandra (20)

More from Gokhan Atil

More from Gokhan Atil (15)

Recently uploaded

Recently uploaded (20)

Introduction to Cassandra