Basic stuff You Need to Know about Cassandra

BASIC STUFF YOU NEED TO KNOW ABOUT
CASSANDRA
AUG. 2018
YU-CHANG HO (ANDY)
FORMER RESEARCH ASSISTANT, ACADEMIA SINICA

A GREEK STORY
➡An Ancient Greek Prophet
➡Second-most beautiful woman in the
world
➡Gift of Prophecy from Apollo
➡Figure of Tragedy
‣ Ref. https://www.wikiwand.com/en/
Cassandra

APACHE CASSANDRA
WHAT IS APACHE CASSANDRA (C*)?
▸ Originated at Facebook Inc.
▸ Combines the concept of Google BigTable & Amazon Dynamo.
▸ Data Modeling: BigTable
▸ System Architecture: Dynamo
▸ A distributed database system with high scalability.
▸ Written in Java (The JVM Tuning Hell!!).
Ref. https://www.wikiwand.com/en/Apache_Cassandra
Ref. https://www.wikiwand.com/en/Dynamo_(storage_system)
Ref. https://www.wikiwand.com/en/Bigtable

APACHE CASSANDRA
WHAT IS APACHE CASSANDRA (C*)?- CONT.
▸ It is a popular database system! (Ranked in 2018)
1 Oracle
2 MySQL
3 Microsoft SQL Server
4 PostgreSQL
5 MongoDB
6 DB2
7 Reids
8 Elasticsearch
9 Microsoft Access
10 Cassandra
Ref. https://db-engines.com/en/ranking

APACHE CASSANDRA
WHAT IS APACHE CASSANDRA (C*)?- CONT.
▸ There is no master/slave relationship among C* nodes!
▸ Every node could be read/written.
▸ In our scenario, we assume the GCP node to be the
“Master” to control the data insertion.
▸ Our currently using version: 3.11.2.

APACHE CASSANDRA
THE CAP THEOREM
▸ Eric Brewer, UC Berkeley
▸ C: Consistency
▸ A: Availability
▸ P: Partition-tolerance
▸ All 3 parts of CAP cannot be satisﬁed at the same time.
Ref. https://www.wikiwand.com/en/CAP_theorem

APACHE CASSANDRA
THE CAP THEOREM OF CASSANDRA
▸ C: The consistency of data → Eventually consistency
▸ A: The availability of service → Always available
▸ P: Ability to distribute that load effectively → High Scalability
▸ Still we could tried to satisﬁed all the three parts! (Tuning the
consistency level for R/W)
▸ Provided high availability and some level of consistency.
Ref. https://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed

APACHE CASSANDRA
TERMINOLOGY YOU NEED TO UNDERSTAND
▸ Data Center: A group of nodes
▸ Rack: Also, a group of nodes
Data Center
Rack 1 Rack 2
A Node

APACHE CASSANDRA
TERMINOLOGY YOU NEED TO UNDERSTAND- CONT.
▸ Ring: The logical representation of the cluster of nodes.
1
3
24

APACHE CASSANDRA
▸ Keyspace: ColumnFamily in BigTable, Database in MySQL
▸ Table: Just table, don’t be confused! :-)
▸ MemTable: Cassandra will first store in memory. After a
certain among of data is reached, flush to disk (SSTable).
▸ Commit_log: Not only store in memory, C* will also first
create a log for those new data to prevent from failure and
is able to restore those data if bad thing happens.
▸ SSTable: The compressed files of data stored in disk.

APACHE CASSANDRA
▸ Replica: A copy/duplication of data.
▸ Replication Factor (RF): The number of replica you wish to
maintain in a certain data center.
▸ Partitioner: A partitioner determines how data is distributed
across the nodes in the cluster. (Token created by hash.)
▸ Coordinator: It is a role (sub-process) when one of a node
receive a query. It try to ﬁnd the data among nodes. And on
each node, MemTable and SSTable are checked.

APACHE CASSANDRA
▸ Gossip Protocol: The protocol for a C* node to discover the
information of other nodes.
▸ Seed Node: The node that mainly keep the topology information.
▸ Now, we have GCP (TW), UCSD (US), NTU (JP) seed nodes.
▸ Snitch: The protocol for a C* node to map IPs to racks and data
centers (the topology).
▸ When perform a read, a snitch would be useful.
▸ Create the topology and help decide which node to be query.

APACHE CASSANDRA
▸ Consistency Level (CL): The arbitrary assignment of
consistent the query should achieve.
ANY
Lowest level. Even if all the replica node are
down, the withe could still successd.
ONE At least one replica node should succeed.
QUORUM (RF / 2) + 1 nodes should succeed.
ALL Highest level, every node should succeed.
LOCAL_ONE
For multiple data center. One node in a certain
data center should succeed.
LOCAL_QUORUM For multiple data center. See QUORUM.
Ref. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConﬁgConsistency.html

APACHE CASSANDRA
▸ Compaction: Commit the data. Clean the deleted data and
compress the remaining data in to SSTable.
▸ When performing repair, SSTable rebuild, or clean, you
might see C* is doing compaction in order to make the data
consistent.
▸ Tombstone: Data deletion is not as usual. Delete is done as
insertion (mark a data to be deleted).
▸ gc_grace_period: A certain period of time that C* will ensure
all the nodes received the tombstone info. (Default: 10 days)

CONFIGURATION
APACHE CASSANDRA

APACHE CASSANDRA
CASSANDRA CONFIGURATION
cluster_name <cluster name>
listen_interface <ethernet interface name>
listen_address <the IP address on the main interface>
authenticator PasswordAuthenticator
authorizer CassandraAuthorizer
endpoint_snitch GossipingPropertyFileSnitch
seeds <the seed server address>
broadcast_address <External IP address>
permissions_validity_in_ms 20000
concurrent_reads 16 * num. of disk used by data_ﬁle_directories
concurrent_writes 8 * num. of cores
concurrent_counter_writes 16 * num. of disk used by data_ﬁle_directories
streaming_keep_alive_period_in_secs 3600 (1hr)
read_request_timeout_in_ms 10000

APACHE CASSANDRA
CASSANDRA CONFIGURATION- CONT.
listen_interface <ethernet interface name>
listen_address <the IP address on the main interface>
broadcast_address <External IP address>
▸ Most of our machine is a VM, which might under a local
DHCP environment. The main interface might listen on a
local IP, say 192.168.xxx.xxx.
▸ In this case, you need to set the broadcast_address to
make the other nodes able to ﬁnd the node you are going
to add.

APACHE CASSANDRA
▸ authenticator / authorizer: A pair of assignment for Cassandra
account management.
▸ (PasswordAuthenticator/CassandraAuthorizer) is a ﬁxed pair,
don’t change them.
▸ endpoint_snitch: What kind of snitch you would like to use.
▸ GossipingPropertyFileSnitch: You need to modiﬁed
cassandra-rackdc.properties to use this snitch.
authenticator PasswordAuthenticator
authorizer CassandraAuthorizer
endpoint_snitch GossipingPropertyFileSnitch

APACHE CASSANDRA
▸ permissions_validity_in_ms: How long to cache the auth.
info?
permissions_validity_in_ms 20000
concurrent_reads 16 * num. of disk used by data_ﬁle_directories
concurrent_writes 8 * num. of cores
concurrent_counter_writes 16 * num. of disk used by data_ﬁle_directories
▸ concurrent_*: Hardware resource dependent.

APACHE CASSANDRA
▸ Some of the machine has higher network latency, these
settings will try to prevent the Cassandra from time-out.
streaming_keep_alive_period_in_secs 3600 (1hr)
read_request_timeout_in_ms 10000

APACHE CASSANDRA
▸ Still a lot of conﬁgure to learn and discover!
▸ Lots of comments available in cassandra.yaml. Check
them out if you have time.

APACHE CASSANDRA
HOW IS DATA WRITTEN?
1. Write data to MemTable (memory) & log data in
commit_log (disk)
‣ Durable writes: Failure tolerance!
2. Flush data from MemTable
‣ commitlog_total_space_in_mb: Threshold to ﬂush
3. Storing data on disk in SSTable

APACHE CASSANDRA
HOW IS DATA WRITTEN?- CONT.
‣ Commit_log replay on restart. This is why sometimes the
reboot of Cassandra might be longer and sometimes
shorter. It depends on how many data it should replay.
WRITE
REQUEST
SSTABLE
(COMMITTED DATA)
COMMIT_LOG
FlushMEMTABLE
Memory
Hard Disk
Ref. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlHowDataWritten.html

APACHE CASSANDRA
HOW IS DATA WRITTEN?- CONT.
‣ Notes that it is recommended to keep the storage of commit_log and
SSTable on different disk.
‣ If possible, attach at least 3 hard disk drive to your machine. (SSD is more
than welcome!)
WRITE
REQUEST
SSTABLE
(COMMITTED DATA)
COMMIT_LOG
FlushMEMTABLE
Memory
Hard Disk
Ref. https://wiki.apache.org/cassandra/PerformanceTuning

APACHE CASSANDRA
HOW IS DATA READ?
▸ Coordinator will find which node(s) to ask for the required
data.
▸ On the responsible node:
▸ Try to find data in MemTable first.
▸ Find the data in compressed SSTable file.
▸ Combine the results (from MemTable & SSTable) and
return to the coordinator.
Ref. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html

APACHE CASSANDRA
HOW IS DATA DELETED?
▸ Keep in mind that it is a large-scale distributed system.
Deletion could be dangerous to harm the consistency.
▸ Deletion as Insertion: Tombstone.
▸ gc_grace_seconds: Prevent from party-rock zombie!!
▸ Compaction “clear” the data.
▸ You may assign a TTL to a data row!
Ref. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html
Everyday I’m shuﬄing!

HI, STILL ON
BOARD?🤯
Just KEEP MOVING!
— Lara Craft, Tomb Raider 2013

CONCEPTS & MONITORING
APACHE CASSANDRA

APACHE CASSANDRA
REPLICA, REPLICATION FACTOR (RF)
▸ How to determine the placement of replica?
▸ SimpleStrategy & NetworkTopologyStrategy
▸ Simple Strategy: Places the ﬁrst replica on a node
determined by the partitioner. Additional replicas are
placed on the next nodes clockwise in the ring without
considering topology.

APACHE CASSANDRA
REPLICA, REPLICATION FACTOR (RF)- CONT.
▸ NetworkTopologyStrategy: Required to set the RF for
each data center.
▸ NetworkTopologyStrategy: Places replicas in the same
datacenter by walking the ring clockwise until reaching
the ﬁrst node in another rack.
ALTER KEYSPACE <keyspace> WITH REPLICATION
= {'class': 'NetworkTopologyStrategy',
'DC1': <num>, 'DC2': <num>} with
durable_write=true;

APACHE CASSANDRA
NETWORKTOPOLOGYSTRATEGY VS. SIMPLESTRATEGY
▸ Multiple DC using SimplyStrategy:
1
4
6 2
5 3
TW
TW
TW
US
JP
TW

APACHE CASSANDRA
NETWORKTOPOLOGYSTRATEGY VS. SIMPLESTRATEGY- CONT.
▸ Why this is horrible?
1
4
6 2
5 3
TW
TW
US
JP
TW
Network bottleneck
A Data Query
TW

APACHE CASSANDRA
NETWORKTOPOLOGYSTRATEGY VS. SIMPLESTRATEGY- CONT.
▸ Multiple DC using NetworkTopologyStrategy:
1
3
241 1TW
USJP
RF = 1
RF = 2
RF = 1

APACHE CASSANDRA
REPLICATION UNDER NETWORKTOPOLOGYSTRATEGY
1
4
6 2
5 3
1
2
3
4
5
6
3
4
5
6
1
2
5
6
1
2
3
4
Dataset
Rack 1
Rack 2
Rack 3
Replica 1 Replica 2

APACHE CASSANDRA
REPLICATION- CONT.
▸ It’s all about fault-tolerance (Availability).
▸ Enable the system to continue working even though there
are some node is not available.
▸ Fault-tolerance in the level of data center, rack.
▸ Do not let RF > {NUM. OF NODES IN A DC}!!!
▸ Always remember to increase the RF of system_auth
keyspace before you add a new node!!!
Ref. https://docs.datastax.com/en/cassandra/3.0/cassandra/architecture/archDataDistributeReplication.html

APACHE CASSANDRA
HINTED-HANDOFF
▸ The process that help the dead node to recover the data.
▸ The other nodes will keep the data for a certain period of
time for the dead node. When the node come back online,
they will stream the data to that revived node.
▸ Default: 3 days. Therefore, we should deal with the dead
node and bring it back within this period.

APACHE CASSANDRA
NODETOOL
▸ A monitoring/controlling tool of C*.
▸ To control C*, you should be familiar with this guy.
▸ Refer to: https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsNodetool.html

APACHE CASSANDRA
CQLSH
▸ CQL: Cassandra Query Language, looks like traditional
SQL command.
▸ A commanding shell to interact with C*.
▸ Look like this:

APACHE CASSANDRA
CQLSH- CONT.
▸ You may alter the settings of existing keyspace, table using
CQLSH. For example, change the RF of a keyspace.
▸ Of course, CQLSH could be used to create/delete/modify/
query keyspace and table.
▸ Refer to: https://paper.dropbox.com/doc/Cassandra-Management-Operations--
AIIgTHW33s5ArnWYx18kxfU3Ag-
AvuMYLwTQhgWUKc6h1sUd#:uid=865346154186617362484552&h2=The-cqlsh-Command

APACHE CASSANDRA
THE SYSTEM STATUS CHECK
▸ This command return all the status of existing nodes.
▸ Status interpretation:
▸ UN (Up/Normal): Node is working properly
▸ DN (Down/Normal): The Node is ofﬂine
▸ UL (Up/Leaving): The Node is leaving the cluster (node
deletion)
$ nodetool status

APACHE CASSANDRA
THE SYSTEM STATUS CHECK- CONT.
▸ This command also tells you the data portion of each DC,
the disk usage, and the UUID of a node.
$ nodetool status
Disk Usage Data Portion

APACHE CASSANDRA
▸ This command shows the listening port of the machine.
▸ It’s a quick way to check if C* is still online.
▸ Cassandra port usage:
$ netstat -lnt
7000 Gossiping port (unencrypted)
9042 CQLSH/client API communication port
7199 JMX monitoring port

APACHE CASSANDRA
▸ This command shows the status of C* process. It should
always in the status “active (running)”.
▸ If you see the status is in “active (exited)”, then C* is
already dead due to some error present. Check the log for
further information.
$ service cassandra status

CASSANDRA OPERATIONS
(HUMAN INVOLVED 😈)
APACHE CASSANDRA

APACHE CASSANDRA
REPAIR
▸ The process to maintain data consistency across the
cluster.
▸ This is the operation that will make you burn the midnight
oil……
$ nodetool repair [option]
Cassandra
Ref. https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsRepair.html

APACHE CASSANDRA
REPAIR- CONT.
▸ Full Repair & Incremental Repair
▸ This should be done periodically!
▸ As recommendation by C* ofﬁcial:
▸ Incremental repair every 1 - 3 days (within GC grace
period)
▸ Full repair every 1 - 3 weeks
Ref. http://cassandra.apache.org/doc/latest/operating/repair.html

APACHE CASSANDRA
REPAIR- CONT.
▸ How to monitor the repair progress?? Good question!
▸ The log ﬁles
▸ Useful commands:
$ nodetool netstats
# print the status of streaming
$ nodetool compactionstats
# print th status of compaction
$ nodetool tpstats
# show the thread pool running processes

APACHE CASSANDRA
SSTABLE CORRUPTION
▸ If a repair failed or the data sync is not well performed, this
will happen……
▸ For example, when you see this after repair is done:
▸ Prepare a cup of coffee, you might need it……. 😨
[2017-05-16 00:26:40,555] Repair session dbbf6510-39ef-11e7-8027-d710f406f829 for range
(-4631786651008530880,-4578496872070625882] failed with error [repair #dbbf6510-39ef-11e7-8027-
d710f406f829 on watchtower_keyspace/release_stages,
(-4631786651008530880,-4578496872070625882]] Validation failed in /xxx.xxx.xxx.xxx (progress: 0%)

APACHE CASSANDRA
SSTABLE CORRUPTION- CONT.
▸ All you need to do, is to run the following on the node with IP
xxx.xxx.xxx.xxx:
▸ Same as repair, using the same set of nodetool commands to see if C*
is still working.
▸ If everything goes well, try the repair again and hope nothing bad
happen again.
$ nodetool scrub
[2017-05-16 00:26:40,555] Repair session dbbf6510-39ef-11e7-8027-d710f406f829 for range
(-4631786651008530880,-4578496872070625882] failed with error [repair #dbbf6510-39ef-11e7-8027-
d710f406f829 on watchtower_keyspace/release_stages,
(-4631786651008530880,-4578496872070625882]] Validation failed in /xxx.xxx.xxx.xxx (progress: 0%)
Ref. https://support.datastax.com/hc/en-us/articles/205256895--Validation-failed-when-running-a-nodetool-
repair

APACHE CASSANDRA
RUNNING OUT OF DISK SPACE! DO YOU PERFORM DELETION?
▸ Remember as for now, the Master node has only 100GB of
disk space. Approximately, the data will grow 1.xGB each
month.
▸ Frequently check the following:
$ nodetool status
# check the data portion and disk usage
$ df -h
# check the real hard disk space usage

APACHE CASSANDRA
RUNNING OUT OF DISK SPACE! DO YOU PERFORM DELETION?- CONT.
▸ If the C* eat up too many space, you could perform the
data deletion by issuing a repair:
▸ Or you could try to clear the data snapshot:
$ nodetool repair [option]
# repair the data and this will trigger the data
compaction
$ nodetool clearsnapshot
Ref. https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsAboutSnapshots.html

APACHE CASSANDRA
NEW NODE COMING IN, GREAT!
▸ Make sure RF of system_auth is increased ﬁrst.
▸ Perform network connectivity, performance check.
▸ Refer to here: https://paper.dropbox.com/doc/Cassandra-Management-Operations--
AvuMYLwTQhgWUKc6h1sUd#:uid=308409713240027648094943&h2=Add-a-New-Node
▸ Bootstrap of new node might failed, check the log ﬁles
frequently!

APACHE CASSANDRA
NEW NODE COMING IN, GREAT!- CONT.
▸ How could I know the bootstrap failed:
▸ Log ﬁles (of course!)
▸ nodetool status show highly in-balance data portion.
▸ Might be a network throughput issue, try to ﬁx it and
resume the bootstrap:
$ nodetool bootstrap resume

APACHE CASSANDRA
LESS POSSIBLE BUT COULD HAPPEN, NODE DELETION
▸ You might want to delete a node for any issue coming up.
▸ Refer to: https://paper.dropbox.com/doc/Cassandra-Management-Operations--
AvuMYLwTQhgWUKc6h1sUd#:uid=454006913486500030503564&h2=Delete/Remove-a-Node
▸ If everything goes ﬁne, reduce the RF of system_auth to
make the RF of it not larger than the total num. of nodes.

APACHE CASSANDRA
CASSANDRA OPERATIONS
▸ Too many things to discuss, which is hard to cover them all
in this talk.
▸ Please frequently check the doc for further information:
▸ https://paper.dropbox.com/doc/Cassandra-Management-Operations--
AIIgTHW33s5ArnWYx18kxfU3Ag-AvuMYLwTQhgWUKc6h1sUd

APACHE CASSANDRA
SYSTEM_AUTH & CURRENT CASSANDRA USER ACCOUNT
▸ I keep talking about the system_auth keyspace, what is it
anyway?
▸ system_auth: The keyspace that keep the account info. of
Cassandra.
▸ If the data in system_auth is inconsistent, the
authentication might fault on a certain node. You will see
authentication failed for a certain period of time.
▸ Data loss!!!

APACHE CASSANDRA
SYSTEM_AUTH & CURRENT CASSANDRA USER ACCOUNT- CONT.
▸ Increase RF of system_auth ﬁrst before adding a new node
is just a theoretical approach……
▸ Current user account in Cassandra:
cassandra
Default superuser, now treated as a backup superuser. Has
the same password as iisnrl account.
iisnrl The main superuser.
kairosdb The user for master KairosDB insertion. Non-superuser.
lassgroup
The user for participating parties to archive data. Non-
superuser.

THANK YOU!
ALL YOU NEED TO KNOW ABOUT
CASSANDRA

Basic stuff You Need to Know about Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Basic stuff You Need to Know about Cassandra

Similar to Basic stuff You Need to Know about Cassandra (20)

Recently uploaded

Recently uploaded (20)

Basic stuff You Need to Know about Cassandra