SlideShare a Scribd company logo
Outline
I. Why Cassandra?
II. Basic Operations
III. The Cassandra Architecture
IV. Clients
V. Maintenance
I. Why Cassandra?
CAP theorem
• Consistence: all nodes see the same data at the same time.
• Availability: a guarantee that every request receives a
response about whether it success of failed.
• Partition Tolerance: the system continues to operate
despite arbitrary message lose or failure of part of the
system.
Ref: http://uzigood.blogspot.com/2016/06/cap-theorem.html
Partition Tolerance of Mongo
Ref: https://docs.mongodb.com/manual/replication/
Ref: https://docs.mongodb.com/manual/core/read-preference/
Partition Tolerance of Cassandra
Cassandra uses consistent hashing to
determine which nodes out of your
cluster must manage the data you are
passing in. You set a replication factor,
which basically states to how many
nodes you want to replicate your data.
How big can it scale? Cassandra can handle the load of
applications like Instagram that have roughly 80 million
photos uploaded to the database every day.
Ref: https://blog.panoply.io/cassandra-vs-mongodb
Cassandra vs Mongo
Ref: https://scalegrid.io/blog/cassandra-vs-mongodb/
Cassandra vs Mongo
Yes
IMS, RDBMSs, NoSQL. The horse, the car, the plane.
II. Basic Operations
Get started
1. Apache Cassandra Cluster: Apache Cassandra Cluster
as a database server spread across a number of
machines.
2. Keyspaces : A keyspace is a logical grouping of Apache
Cassandra tables.
3. Tables : An Apache Cassandra table is similar to an
RDBMS table.
4. Primary Key: A Primary key uniquely identifies an
Apache Cassandra row. A primary key can be a simple
key or a composite key. A composite key is made up of
two parts, a partition key and a cluster key. The partition
key determines data distribution in the cluster while the
cluster key determines sort order within a partition.
Terminology
cqlsh> DESCRIBE CLUSTER;
cqlsh> DESCRIBE KEYSPACES;
cqlsh> CREATE KEYSPACE my_keyspace WITH replication = {
'class': 'SimpleStrategy',
'replication_factor': 1};
cqlsh:my_keyspace> CREATE TABLE user (
first_name text ,
last_name text,
PRIMARY KEY (first_name)) ;
Get started
Ref: http://abiasforaction.net/cassandra-query-language-cql-
tutorial/
cqlsh> DESCRIBE KEYSPACES;
cqlsh:my_keyspace> DESCRIBE KEYSPACES;
cqlsh:my_keyspace> DESCRIBE KEYSPACE my_keyspace;
cqlsh:my_keyspace> DESCRIBE TABLE user;
DESCRIBE
INSERT
cqlsh:my_keyspace> INSERT INTO user (first_name , last_name ) VALUES ('ben', 'liu');
cqlsh:my_keyspace> SELECT * FROM user;
cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='ben';
cqlsh:my_keyspace> SELECT COUNT (*) FROM user;
DELETE
cqlsh:my_keyspace> DELETE last_name FROM user WHERE first_name ='ben';
Exercises
1. Create a keyspace named mifly. The class of this keyspace is SimpleStrategy and the
value of replication_factor is set to 1.
2. Create a table and named it as employees. This table has two columns which are first_name
and last_name. The datatypes of first_name and last_name are text. Set first_name as
the primary key of that table.
3. To check that the first_name has been set to primary key, use DESCRIBE to get the
information of employees.
4. Insert the data which is shown below into employees.
first_name last_name
ben liu
maka long
Exercises
5. Dump all columns and all rows from employees.
6. Delete the employee whose first name is maka.
7. Drop table emploees.
8. Drop keyspace mifly.
Cassandra Query Language
Cassandra’s Data Model
cqlsh:my_keyspace> INSERT INTO user (first_name , last_name ) VALUES ( 'doggy', 'wang');
Sparse Table
cqlsh:my_keyspace> UPDATE user SET last_name = 'liu' WHERE first_name ='white' ;
UPDATE
ALTER
ALTER TABLE user ADD phone text ;
ALTER TABLE user DROP phone ;
Timestamps
cqlsh:my_keyspace> SELECT first_name,last_name, writetime(last_name) from user;
Cassandra uses these timestamps for resolving any conflicting changes that are
made to the same value. Generally, the last timestamp wins.
TTL (time to live)
cqlsh:my_keyspace> SELECT first_name, last_name, TTL(last_name) FROM user;
cqlsh:my_keyspace> UPDATE user USING TTL 30 SET last_name='liou' WHERE first_name ='white' ;
Exercises
1. Create a keyspace named mifly. The class of this keyspace is SimpleStrategy and the
value of replication_factor is set to 1.
2. Create a table and named it as employees. This table has two columns which are first_name
and last_name. The datatypes of first_name and last_name are text. Set first_name as
the primary key of that table.
3. To check that the first_name has been set to primary key, use DESCRIBE to get the
information of employees.
4. Insert the data which is shown below into employees. Remain the last_name of feifei empty.
first_name last_name
ben liu
maka long
feifei
Exercises
5. Select feifei and change the value of last_name to king.
6. Add a column of email to the table. The data type of the email column is text.
7. Dump the information of first_name, last_name and TTL of email.
8. Set the email address of ben to mifly@gmail.com and set the TTL to 30s.
9. Drop table emploees.
10. Drop keyspace mifly.
CQL Types
cqlsh:my_keyspace> CREATE TABLE user (
first_name text ,
last_name text,
PRIMARY KEY (first_name)) ;​
Data Types
first_name (text) last_name (text)
ben liu
maka long
Numeric Data Types
Textual Data Types
Other Simple Data Types
• boolean: This is a simple true/false value.
• blob: A binary large object (blob) is a colloquial computing term for an arbitrary array
• of bytes.
• inet: This type represents IPv4 or IPv6 Internet addresses.
• counter: The counter data type provides 64-bit signed integer, whose value cannot be set
directly, but only incremented or decremented.
Time and Identity Data Types
• timestamp: It indicates when the data was last modified with ISO 8601 date formats.
(e.g. 2015-06-15 20:05-0700, 2015-06-15 20:05:07.013-0700).
• date, time: The 2.2 release introduced date and time types that allowed these to be represented
independently.
• uuid: This is a Type 4 UUID (universally unique identifier) which is a 128-bit value based entirely
on random numbers (e.g. 1a6300ca-0572-4736-a393-c0b7229e193e).
• timeuuid: This is a Type 1 UUID, which is based on the MAC address of the computer, the
system time, and a sequence number used to prevent duplicates.
uuid
cqlsh:my_keyspace> ALTER TABLE user ADD id uuid;
cqlsh:my_keyspace> UPDATE user SET id = uuid() WHERE first_name ='ben' ;
Ref: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/timeuuid_functions_r.html
Collections
• set: The set data type stores a collection of elements.
• list: The list data type contains an ordered list of elements.
• map: The map data type contains a collection of key/value pairs.
set
cqlsh:my_keyspace> ALTER TABLE user ADD email set<text> ;
UPDATE user SET email = {'a@email.com', 'b@emai.com'} WHERE first_name ='ben';
UPDATE user SET email= email + {'dog@email.com'} WHERE first_name='white';
list
cqlsh:my_keyspace> ALTER TABLE user ADD phone list<text> ;
cqlsh:my_keyspace> UPDATE user SET phone =['1234567'] WHERE first_name ='fei' ;
cqlsh:my_keyspace> UPDATE user SET phone[0] = null WHERE first_name ='fei';
map
cqlsh:my_keyspace> ALTER TABLE user ADD food map<text, boolean > ;
cqlsh:my_keyspace> UPDATE user SET food = {'beef': false} WHERE first_name = 'white';
User-Defined Types
cqlsh:my_keyspace> CREATE TYPE address (
... street text,
... city text,
... state text);
cqlsh:my_keyspace> ALTER TABLE user ADD addresses map<text, frozen<address>>;
cqlsh:my_keyspace> UPDATE user SET addresses = {
...'home': { street:'ooo', city: 'xxx' } } WHERE first_name='ben' ;
Secondary Indexes
cqlsh:my_keyspace> CREATE INDEX on user (last_name) ;
cqlsh:my_keyspace> SELECT * FROM user WHERE last_name = 'liu' ;
Exercise
Data Modeling
Defining Application Queries
Each box on the diagram represents a step in the application workflow,
with arrows indicating the flows between steps and the associated query.
Introducing Chebotko Diagrams
K for partition key columns and C↑ or C↓ to
represent clustering columns.
Hotel Logical Data Model
Our first query Q1 is to find hotels near a point of interest, so we’ll call our table hotels_by_poi.
Reservation Logical Data Model
Physical Data Modeling
To draw physical models, we need to be able
to add the typing information for each
column.
Hotel Physical Data Model
Reservation Physical Data Model
Calculating Partition Size
N r = 5000 hotel × 100 rooms/hotel × 730 days = 365,000,000 rows
Calculating Size on Disk
Partition size = 16 bytes + 0 bytes + 2.56 GB + 2.92 GB = 5.48 GB
Defining Database Schema
VScode cql extension
III. The Cassandra Architecture
1. The efficiency and the availability of the network topology.
2. The data is distributed to the different nodes with Rings and Tokens.
3. Making data durable and available.
The Design Pattern of Cassandra Cluster
Network Topology
Data Centers and Racks
Cassandra tries to store copies of your data in multiple data centers to maximize availability and partition
tolerance, while preferring to route queries to nodes in the local data center to maximize performance.
Gossip and Failure Detection
1. Once per second, the gossiper will choose a random node in the cluster and initialize
a gossip session with it.
2. The gossip initiator sends its chosen friend a GossipDigestSynMessage.
3. When the friend receives this message, it returns a GossipDigestAckMessage.
4. When the initiator receives the ack message from the friend, it sends the friend a
GossipDigestAck2Message to complete the round of gossip.
org.apache.cassandra.gms.FailureDetector class
Snitches
The snitch will figure out where nodes are in relation to other nodes.
1. Your selected snitch is wrapped with another snitch called the DynamicEndpointSnitch.
2. The dynamic snitch gets its basic understanding of the topology from the selected snitch types.
3. It then monitors the performance of requests to the other nodes, even keeping track of things like
which nodes are performing compaction. The performance data is used to select the best
replica for each query.
I/O Architecture
Rings and Tokens
• A token is a 128-bit integer ID used to identify each partition.
• A node claims ownership of the range of values less than or equal to each token and
greater than the token of the previous node.
• Data is assigned to nodes by using a hash function (partitioner) to calculate a token for the
partition key.
Virtual Nodes
Ref: http://docs.basho.com/riak/kv/2.2.3/learn/concepts/vnodes/
node0
node1
node2
node3
Cassandra’s 1.2 release introduced the concept of virtual nodes, also called vnodes for short. Instead of
assigning a single token to a node, the token range is broken up into multiple smaller ranges.
Replication Strategies
1. The SimpleStrategy places replicas at consecutive nodes around the ring, starting with the node
indicated by the partitioner.
2. The NetworkTopologyStrategy allows you to specify a different replication factor for each data center.
Within a data center, allocates replicas to different racks in order to maximize availability.
SimpleStrategy
The SimpleStrategy places replicas at consecutive nodes around the ring, starting with the
node indicated by the partitioner.
NetworkTopologyStrategy
The total number of replicas that will be stored is equal to the sum of the replication factors for each data
center.
The NetworkTopologyStrategy allows you to
specify a different replication factor for each data
center. ​​Within a data center, allocates replicas to
different racks in order to maximize availability.​
Consistency Levels
For read queries, the consistency level specifies how many replica nodes must respond to a read request
before returning the data.
For write operations, the consistency level specifies how many replica nodes must respond for the write to
be reported as successful to the client.
Setting consistency levels:
(1) ONE, TWO, and THREE, each of which specify an absolute number of replica nodes that must respond to a request.
(2) The QUORUM consistency level requires a response from a majority of the replica nodes
(e.g. "replication factor / 2 + 1").
(3) The ALL consistency level requires the response from all of the replicas.
(4) The ANY consistency level requires arbitrary responses from all of the replicas.
R + W > N = strong consistency
Read/Write Data from Nodes
A client may connect to any node in the
cluster to initiate a read or write query.
This node is known as the coordinator
node.
For a read, the coordinator contacts
enough replicas to ensure the required
consistency level is met, and returns the
data to the client.
Read/Write Data from Nodes
For a write, the coordinator node
contacts all replicas, as determined
by the consistency level and
replication factor, and considers
the write successful when a
number of replicas commensurate
with the consistency level
acknowledge the write.​
I/O Mechanism
Cassandra node
Cassandra stores data both in memory and on disk to provide both high performance and durability.
Commit Logs
When you perform a write operation, it’s immediately
written to a commit log.
The commit log gets replayed if the database crashes
unexpectedly
Memtables
After it’s written to the commit log, the value is written
to a memory-resident data structure called the
memtable. Each memtable contains data for a specific
table.
When the number of objects stored in the memtable
reaches a threshold, the contents of the memtable are
flushed to disk in a file called an SSTable and a new
memtable then created.
SSTables
Each commit log maintains an internal bit flag to
indicate whether it needs flushing.
When a write operation is first received, it is
written to the commit log and its bit flag
is set to 1.
Once the memtable has been properly flushed
to disk, the corresponding commit log’s bit flag
is set to 0, indicating that the commit log no
longer has to maintain that data for durability
purposes.
On reads, Cassandra will read both SSTables and
memtables to find data values.
Caching
The key cache stores a map of partition keys to row index
entries, facilitating faster read access into SSTables
stored on disk. The key cache is stored on the JVM heap.
The row cache caches entire rows and can greatly speed
up read access for frequently accessed rows, at the cost
of more memory usage. The row cache is stored in off-
heap memory.
Pseudo Cassandra Cluster
Cassandra Cluster Manager
Cassandra Cluster Manager or ccm is a set of Python scripts that allow you to run a multi-
node cluster on a single machine.
$ sudo pip3 install ccm
$ sudo service ccm stop
$ ccm create -v 3.0.0 -n 3 my_cluster --vnodes
$ ccm list
$ ccm start
$ ccm status
Cluster: 'my_cluster'
---------------------
node1: UP
node3: UP
node2: UP
Cassandra Cluster Manager
This is equivalent to running the command nodetool status on the individual node.
Cassandra Cluster Manager
We can run the nodetool ring command in order to get a list of the tokens owned by each node.
Adding a Nodes to a Cluster
$ ccm add node4 -i 127.0.0.4 -j 7400
The tokens will be reallocated across all of the nodes.
$ cd ~/.ccm; ls
CURRENT my_cluster repository
$ cd my_cluster; ls
cluster.conf node1 node2 node3
$ cd ~/.ccm/my_cluster
$ diff node1/conf/ node2/conf/
Cluster Configuration
Seed Nodes
A seed node is used as a contact point for other nodes, so Cassandra can learn the topology of the
cluster—that is, what hosts have what ranges.
For example, if node A acts as a seed for node C, when node C comes online, it will use node A as a
reference point from which to get topology . This process is known as bootstrapping.
Seed nodes do not auto bootstrap because it is assumed that they will be the first nodes in the cluster.
A
B
C
Cassandra.yaml in node1~node3
node1 - seeds: 127.0.0.1
node2 - seeds: 127.0.0.1,127.0.0.2
node3 - seeds: 127.0.0.1,127.0.0.2,127.0.0.3
Snitches
Snitches gather some information about your network topology so that Cassandra can efficiently
route requests.
• Simple Snitch: it unsuitable for multi-data center deployments. If you choose to use this snitch, you
should also use the SimpleStrategy replication strategy for your keyspaces.
• Property File Snitch: it uses information you provide about the topology of your cluster in a standard Java
key/value properties file called cassandratopology.properties.
• Gossiping Property File Snitch: The data exchanges information about its own rack and data cen‐
ter location with other nodes via gossip. The rack and data center locations are defined in the cassandra-
rackdc.properties file.
Snitches
You configure the endpoint snitch implementation to use by updating the endpoint_snitch property in
the cassandra.yaml file.
Exercise
1. Using ccm to create a pseudo cassandra cluster with 3 nodes. The cassandra version of the nodes is
set to 3.0.0 . The nodes use vnode to segment the tokens.
2. Before you starting up the cluster, configure the settings of each nodes. Use GossipingPropertyFile-
Snitch to assign the datacenter and the rack of each node.
3. Stop the pseudo cluster. Configuring the setting of snitch to SimpleSnitch and restart the cluster.
What's happening after you switching from GossipingPropertyFileSnitch to SimpleSnitch. Try to solve
that error.
Tokens and Virtual Nodes
You configure the token numbers by updating the num_token property in the cassandra.yaml file.
The value of num_token is configured to 1 and the result is shown in the figure bellow. Each node
just holds a token.
Network Interfaces
Node ip
• listen_address: the ip address of the node.
• storage_port: designate the port used for inter-node communications, typically 7000.
Thrift transport (Remote Procedure Call which will be removed entirely in a future release)
• rpc_port: default 9160.
• rpc_address: the ip address of the node.
native transport (since cassandra 0.8)
• start_native_transport: set it to true to enable native transport (the native transport handles
the communication between client and server).
• native_transport_port: designate the port used for native transport, typically 9042.
Data Storage
• commitlog_directory: the directory to store the commit logs.
• data_file_directories: the directory to store SSTables.
• disk_failure_policy, commit_failure_policy: set the failure response.
Create a Cassandra Cluster
Ref: https://twgame.wordpress.com/2015/02/16/real-machine-cassandra-cluster/
Building a Cassandra Cluster
node1 node2
IV. Clients
JVM languages
JVM IDE
libraryDependencies += "com.datastax.cassandra" % "cassandra-driver-core" % "3.5.1"
libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.6.4"
libraryDependencies += "org.apache.logging.log4j" % "log4j-core" % "2.11.1"
Scala client
build.sbt
Scala client
Setting Consistency Levels
V. Maintenance
Cassandra
Cassandra
Cassandra

More Related Content

What's hot

SQL
SQLSQL
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
Avjinder (Avi) Kaler
 
Day 2 repeats.pptx
Day 2 repeats.pptxDay 2 repeats.pptx
Day 2 repeats.pptx
Adrien Melquiond
 
Day 2b i/o.pptx
Day 2b   i/o.pptxDay 2b   i/o.pptx
Day 2b i/o.pptx
Adrien Melquiond
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres OpenRobert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres OpenPostgresOpen
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
Avjinder (Avi) Kaler
 
Day 1d R structures & objects: matrices and data frames.pptx
Day 1d   R structures & objects: matrices and data frames.pptxDay 1d   R structures & objects: matrices and data frames.pptx
Day 1d R structures & objects: matrices and data frames.pptx
Adrien Melquiond
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
Adrien Melquiond
 
Dms 22319 micro project
Dms 22319 micro projectDms 22319 micro project
Dms 22319 micro project
ARVIND SARDAR
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL I
Sankhya_Analytics
 
Php and database functionality
Php and database functionalityPhp and database functionality
Php and database functionality
Sayed Ahmed
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
Rupak Roy
 
1.4 data cleaning and manipulation in r and excel
1.4  data cleaning and manipulation in r and excel1.4  data cleaning and manipulation in r and excel
1.4 data cleaning and manipulation in r and excel
Simple Research
 
Sql smart reference_by_prasad
Sql smart reference_by_prasadSql smart reference_by_prasad
Sql smart reference_by_prasad
paddu123
 

What's hot (17)

SQL
SQLSQL
SQL
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
Day 2 repeats.pptx
Day 2 repeats.pptxDay 2 repeats.pptx
Day 2 repeats.pptx
 
Mysql Ppt
Mysql PptMysql Ppt
Mysql Ppt
 
Day 2b i/o.pptx
Day 2b   i/o.pptxDay 2b   i/o.pptx
Day 2b i/o.pptx
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres OpenRobert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
Robert Bernier - Recovering From A Damaged PostgreSQL Cluster @ Postgres Open
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
Day 1d R structures & objects: matrices and data frames.pptx
Day 1d   R structures & objects: matrices and data frames.pptxDay 1d   R structures & objects: matrices and data frames.pptx
Day 1d R structures & objects: matrices and data frames.pptx
 
Day 1b R structures objects.pptx
Day 1b   R structures   objects.pptxDay 1b   R structures   objects.pptx
Day 1b R structures objects.pptx
 
Dms 22319 micro project
Dms 22319 micro projectDms 22319 micro project
Dms 22319 micro project
 
Getting Started with MySQL I
Getting Started with MySQL IGetting Started with MySQL I
Getting Started with MySQL I
 
Php and database functionality
Php and database functionalityPhp and database functionality
Php and database functionality
 
3
33
3
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
 
1.4 data cleaning and manipulation in r and excel
1.4  data cleaning and manipulation in r and excel1.4  data cleaning and manipulation in r and excel
1.4 data cleaning and manipulation in r and excel
 
Sql smart reference_by_prasad
Sql smart reference_by_prasadSql smart reference_by_prasad
Sql smart reference_by_prasad
 

Similar to Cassandra

Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
Brian Enochson
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
Brian Enochson
 
please code in c#- please note that im a complete beginner- northwind.docx
please code in c#- please note that im a complete beginner-  northwind.docxplease code in c#- please note that im a complete beginner-  northwind.docx
please code in c#- please note that im a complete beginner- northwind.docx
AustinaGRPaigey
 
Locking Down Your MySQL Database.pptx
Locking Down Your MySQL Database.pptxLocking Down Your MySQL Database.pptx
Locking Down Your MySQL Database.pptx
Dave Stokes
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Chapter8 my sql revision tour
Chapter8 my sql revision tourChapter8 my sql revision tour
Chapter8 my sql revision tour
KV(AFS) Utarlai, Barmer (Rajasthan)
 
Module 3
Module 3Module 3
Module 3
cs19club
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
TAISEEREISA
 
MySQL and its basic commands
MySQL and its basic commandsMySQL and its basic commands
MySQL and its basic commands
Bwsrang Basumatary
 
mysqlanditsbasiccommands-150226033905-conversion-gate02.pdf
mysqlanditsbasiccommands-150226033905-conversion-gate02.pdfmysqlanditsbasiccommands-150226033905-conversion-gate02.pdf
mysqlanditsbasiccommands-150226033905-conversion-gate02.pdf
pradnyamulay
 
My sql with querys
My sql with querysMy sql with querys
My sql with querysNIRMAL FELIX
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Fdms 1st cycle exp.pdf
Fdms 1st cycle exp.pdfFdms 1st cycle exp.pdf
Fdms 1st cycle exp.pdf
RajaReddy777385
 
Lecture 3.mte 407
Lecture 3.mte 407Lecture 3.mte 407
Lecture 3.mte 407
rumanatasnim415
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011bostonrb
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
Dushyant Nasit
 

Similar to Cassandra (20)

Cassandra20141009
Cassandra20141009Cassandra20141009
Cassandra20141009
 
Cassandra20141113
Cassandra20141113Cassandra20141113
Cassandra20141113
 
Databases with SQLite3.pdf
Databases with SQLite3.pdfDatabases with SQLite3.pdf
Databases with SQLite3.pdf
 
please code in c#- please note that im a complete beginner- northwind.docx
please code in c#- please note that im a complete beginner-  northwind.docxplease code in c#- please note that im a complete beginner-  northwind.docx
please code in c#- please note that im a complete beginner- northwind.docx
 
Locking Down Your MySQL Database.pptx
Locking Down Your MySQL Database.pptxLocking Down Your MySQL Database.pptx
Locking Down Your MySQL Database.pptx
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Chapter8 my sql revision tour
Chapter8 my sql revision tourChapter8 my sql revision tour
Chapter8 my sql revision tour
 
Module 3
Module 3Module 3
Module 3
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
 
MySQL and its basic commands
MySQL and its basic commandsMySQL and its basic commands
MySQL and its basic commands
 
mysqlanditsbasiccommands-150226033905-conversion-gate02.pdf
mysqlanditsbasiccommands-150226033905-conversion-gate02.pdfmysqlanditsbasiccommands-150226033905-conversion-gate02.pdf
mysqlanditsbasiccommands-150226033905-conversion-gate02.pdf
 
My sql with querys
My sql with querysMy sql with querys
My sql with querys
 
PT- Oracle session01
PT- Oracle session01 PT- Oracle session01
PT- Oracle session01
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
My sql.ppt
My sql.pptMy sql.ppt
My sql.ppt
 
Fdms 1st cycle exp.pdf
Fdms 1st cycle exp.pdfFdms 1st cycle exp.pdf
Fdms 1st cycle exp.pdf
 
Lecture 3.mte 407
Lecture 3.mte 407Lecture 3.mte 407
Lecture 3.mte 407
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011Mongodb in-anger-boston-rb-2011
Mongodb in-anger-boston-rb-2011
 
Sql lite android
Sql lite androidSql lite android
Sql lite android
 

Recently uploaded

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 

Recently uploaded (20)

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 

Cassandra

  • 1.
  • 2. Outline I. Why Cassandra? II. Basic Operations III. The Cassandra Architecture IV. Clients V. Maintenance
  • 4. CAP theorem • Consistence: all nodes see the same data at the same time. • Availability: a guarantee that every request receives a response about whether it success of failed. • Partition Tolerance: the system continues to operate despite arbitrary message lose or failure of part of the system. Ref: http://uzigood.blogspot.com/2016/06/cap-theorem.html
  • 5. Partition Tolerance of Mongo Ref: https://docs.mongodb.com/manual/replication/ Ref: https://docs.mongodb.com/manual/core/read-preference/
  • 6. Partition Tolerance of Cassandra Cassandra uses consistent hashing to determine which nodes out of your cluster must manage the data you are passing in. You set a replication factor, which basically states to how many nodes you want to replicate your data. How big can it scale? Cassandra can handle the load of applications like Instagram that have roughly 80 million photos uploaded to the database every day.
  • 9. IMS, RDBMSs, NoSQL. The horse, the car, the plane.
  • 12. 1. Apache Cassandra Cluster: Apache Cassandra Cluster as a database server spread across a number of machines. 2. Keyspaces : A keyspace is a logical grouping of Apache Cassandra tables. 3. Tables : An Apache Cassandra table is similar to an RDBMS table. 4. Primary Key: A Primary key uniquely identifies an Apache Cassandra row. A primary key can be a simple key or a composite key. A composite key is made up of two parts, a partition key and a cluster key. The partition key determines data distribution in the cluster while the cluster key determines sort order within a partition. Terminology
  • 13. cqlsh> DESCRIBE CLUSTER; cqlsh> DESCRIBE KEYSPACES; cqlsh> CREATE KEYSPACE my_keyspace WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh:my_keyspace> CREATE TABLE user ( first_name text , last_name text, PRIMARY KEY (first_name)) ; Get started Ref: http://abiasforaction.net/cassandra-query-language-cql- tutorial/
  • 14. cqlsh> DESCRIBE KEYSPACES; cqlsh:my_keyspace> DESCRIBE KEYSPACES; cqlsh:my_keyspace> DESCRIBE KEYSPACE my_keyspace; cqlsh:my_keyspace> DESCRIBE TABLE user; DESCRIBE
  • 15. INSERT cqlsh:my_keyspace> INSERT INTO user (first_name , last_name ) VALUES ('ben', 'liu'); cqlsh:my_keyspace> SELECT * FROM user; cqlsh:my_keyspace> SELECT * FROM user WHERE first_name='ben'; cqlsh:my_keyspace> SELECT COUNT (*) FROM user; DELETE cqlsh:my_keyspace> DELETE last_name FROM user WHERE first_name ='ben';
  • 16. Exercises 1. Create a keyspace named mifly. The class of this keyspace is SimpleStrategy and the value of replication_factor is set to 1. 2. Create a table and named it as employees. This table has two columns which are first_name and last_name. The datatypes of first_name and last_name are text. Set first_name as the primary key of that table. 3. To check that the first_name has been set to primary key, use DESCRIBE to get the information of employees. 4. Insert the data which is shown below into employees. first_name last_name ben liu maka long
  • 17. Exercises 5. Dump all columns and all rows from employees. 6. Delete the employee whose first name is maka. 7. Drop table emploees. 8. Drop keyspace mifly.
  • 19. Cassandra’s Data Model cqlsh:my_keyspace> INSERT INTO user (first_name , last_name ) VALUES ( 'doggy', 'wang');
  • 21. cqlsh:my_keyspace> UPDATE user SET last_name = 'liu' WHERE first_name ='white' ; UPDATE
  • 22. ALTER ALTER TABLE user ADD phone text ; ALTER TABLE user DROP phone ;
  • 23. Timestamps cqlsh:my_keyspace> SELECT first_name,last_name, writetime(last_name) from user; Cassandra uses these timestamps for resolving any conflicting changes that are made to the same value. Generally, the last timestamp wins.
  • 24. TTL (time to live) cqlsh:my_keyspace> SELECT first_name, last_name, TTL(last_name) FROM user; cqlsh:my_keyspace> UPDATE user USING TTL 30 SET last_name='liou' WHERE first_name ='white' ;
  • 25. Exercises 1. Create a keyspace named mifly. The class of this keyspace is SimpleStrategy and the value of replication_factor is set to 1. 2. Create a table and named it as employees. This table has two columns which are first_name and last_name. The datatypes of first_name and last_name are text. Set first_name as the primary key of that table. 3. To check that the first_name has been set to primary key, use DESCRIBE to get the information of employees. 4. Insert the data which is shown below into employees. Remain the last_name of feifei empty. first_name last_name ben liu maka long feifei
  • 26. Exercises 5. Select feifei and change the value of last_name to king. 6. Add a column of email to the table. The data type of the email column is text. 7. Dump the information of first_name, last_name and TTL of email. 8. Set the email address of ben to mifly@gmail.com and set the TTL to 30s. 9. Drop table emploees. 10. Drop keyspace mifly.
  • 28. cqlsh:my_keyspace> CREATE TABLE user ( first_name text , last_name text, PRIMARY KEY (first_name)) ;​ Data Types first_name (text) last_name (text) ben liu maka long
  • 30. Textual Data Types Other Simple Data Types • boolean: This is a simple true/false value. • blob: A binary large object (blob) is a colloquial computing term for an arbitrary array • of bytes. • inet: This type represents IPv4 or IPv6 Internet addresses. • counter: The counter data type provides 64-bit signed integer, whose value cannot be set directly, but only incremented or decremented.
  • 31. Time and Identity Data Types • timestamp: It indicates when the data was last modified with ISO 8601 date formats. (e.g. 2015-06-15 20:05-0700, 2015-06-15 20:05:07.013-0700). • date, time: The 2.2 release introduced date and time types that allowed these to be represented independently. • uuid: This is a Type 4 UUID (universally unique identifier) which is a 128-bit value based entirely on random numbers (e.g. 1a6300ca-0572-4736-a393-c0b7229e193e). • timeuuid: This is a Type 1 UUID, which is based on the MAC address of the computer, the system time, and a sequence number used to prevent duplicates.
  • 32. uuid cqlsh:my_keyspace> ALTER TABLE user ADD id uuid; cqlsh:my_keyspace> UPDATE user SET id = uuid() WHERE first_name ='ben' ; Ref: https://docs.datastax.com/en/cql/3.3/cql/cql_reference/timeuuid_functions_r.html
  • 33. Collections • set: The set data type stores a collection of elements. • list: The list data type contains an ordered list of elements. • map: The map data type contains a collection of key/value pairs.
  • 34. set cqlsh:my_keyspace> ALTER TABLE user ADD email set<text> ; UPDATE user SET email = {'a@email.com', 'b@emai.com'} WHERE first_name ='ben'; UPDATE user SET email= email + {'dog@email.com'} WHERE first_name='white';
  • 35. list cqlsh:my_keyspace> ALTER TABLE user ADD phone list<text> ; cqlsh:my_keyspace> UPDATE user SET phone =['1234567'] WHERE first_name ='fei' ; cqlsh:my_keyspace> UPDATE user SET phone[0] = null WHERE first_name ='fei';
  • 36. map cqlsh:my_keyspace> ALTER TABLE user ADD food map<text, boolean > ; cqlsh:my_keyspace> UPDATE user SET food = {'beef': false} WHERE first_name = 'white';
  • 37. User-Defined Types cqlsh:my_keyspace> CREATE TYPE address ( ... street text, ... city text, ... state text); cqlsh:my_keyspace> ALTER TABLE user ADD addresses map<text, frozen<address>>; cqlsh:my_keyspace> UPDATE user SET addresses = { ...'home': { street:'ooo', city: 'xxx' } } WHERE first_name='ben' ;
  • 38. Secondary Indexes cqlsh:my_keyspace> CREATE INDEX on user (last_name) ; cqlsh:my_keyspace> SELECT * FROM user WHERE last_name = 'liu' ;
  • 41. Defining Application Queries Each box on the diagram represents a step in the application workflow, with arrows indicating the flows between steps and the associated query.
  • 42. Introducing Chebotko Diagrams K for partition key columns and C↑ or C↓ to represent clustering columns.
  • 43. Hotel Logical Data Model Our first query Q1 is to find hotels near a point of interest, so we’ll call our table hotels_by_poi.
  • 45. Physical Data Modeling To draw physical models, we need to be able to add the typing information for each column.
  • 48. Calculating Partition Size N r = 5000 hotel × 100 rooms/hotel × 730 days = 365,000,000 rows
  • 49. Calculating Size on Disk Partition size = 16 bytes + 0 bytes + 2.56 GB + 2.92 GB = 5.48 GB
  • 52. III. The Cassandra Architecture
  • 53. 1. The efficiency and the availability of the network topology. 2. The data is distributed to the different nodes with Rings and Tokens. 3. Making data durable and available. The Design Pattern of Cassandra Cluster
  • 55. Data Centers and Racks Cassandra tries to store copies of your data in multiple data centers to maximize availability and partition tolerance, while preferring to route queries to nodes in the local data center to maximize performance.
  • 56. Gossip and Failure Detection 1. Once per second, the gossiper will choose a random node in the cluster and initialize a gossip session with it. 2. The gossip initiator sends its chosen friend a GossipDigestSynMessage. 3. When the friend receives this message, it returns a GossipDigestAckMessage. 4. When the initiator receives the ack message from the friend, it sends the friend a GossipDigestAck2Message to complete the round of gossip. org.apache.cassandra.gms.FailureDetector class
  • 57. Snitches The snitch will figure out where nodes are in relation to other nodes. 1. Your selected snitch is wrapped with another snitch called the DynamicEndpointSnitch. 2. The dynamic snitch gets its basic understanding of the topology from the selected snitch types. 3. It then monitors the performance of requests to the other nodes, even keeping track of things like which nodes are performing compaction. The performance data is used to select the best replica for each query.
  • 59. Rings and Tokens • A token is a 128-bit integer ID used to identify each partition. • A node claims ownership of the range of values less than or equal to each token and greater than the token of the previous node. • Data is assigned to nodes by using a hash function (partitioner) to calculate a token for the partition key.
  • 60. Virtual Nodes Ref: http://docs.basho.com/riak/kv/2.2.3/learn/concepts/vnodes/ node0 node1 node2 node3 Cassandra’s 1.2 release introduced the concept of virtual nodes, also called vnodes for short. Instead of assigning a single token to a node, the token range is broken up into multiple smaller ranges.
  • 61. Replication Strategies 1. The SimpleStrategy places replicas at consecutive nodes around the ring, starting with the node indicated by the partitioner. 2. The NetworkTopologyStrategy allows you to specify a different replication factor for each data center. Within a data center, allocates replicas to different racks in order to maximize availability.
  • 62. SimpleStrategy The SimpleStrategy places replicas at consecutive nodes around the ring, starting with the node indicated by the partitioner.
  • 63. NetworkTopologyStrategy The total number of replicas that will be stored is equal to the sum of the replication factors for each data center. The NetworkTopologyStrategy allows you to specify a different replication factor for each data center. ​​Within a data center, allocates replicas to different racks in order to maximize availability.​
  • 64. Consistency Levels For read queries, the consistency level specifies how many replica nodes must respond to a read request before returning the data. For write operations, the consistency level specifies how many replica nodes must respond for the write to be reported as successful to the client. Setting consistency levels: (1) ONE, TWO, and THREE, each of which specify an absolute number of replica nodes that must respond to a request. (2) The QUORUM consistency level requires a response from a majority of the replica nodes (e.g. "replication factor / 2 + 1"). (3) The ALL consistency level requires the response from all of the replicas. (4) The ANY consistency level requires arbitrary responses from all of the replicas. R + W > N = strong consistency
  • 65. Read/Write Data from Nodes A client may connect to any node in the cluster to initiate a read or write query. This node is known as the coordinator node. For a read, the coordinator contacts enough replicas to ensure the required consistency level is met, and returns the data to the client.
  • 66. Read/Write Data from Nodes For a write, the coordinator node contacts all replicas, as determined by the consistency level and replication factor, and considers the write successful when a number of replicas commensurate with the consistency level acknowledge the write.​
  • 68. Cassandra node Cassandra stores data both in memory and on disk to provide both high performance and durability.
  • 69. Commit Logs When you perform a write operation, it’s immediately written to a commit log. The commit log gets replayed if the database crashes unexpectedly
  • 70. Memtables After it’s written to the commit log, the value is written to a memory-resident data structure called the memtable. Each memtable contains data for a specific table. When the number of objects stored in the memtable reaches a threshold, the contents of the memtable are flushed to disk in a file called an SSTable and a new memtable then created.
  • 71. SSTables Each commit log maintains an internal bit flag to indicate whether it needs flushing. When a write operation is first received, it is written to the commit log and its bit flag is set to 1. Once the memtable has been properly flushed to disk, the corresponding commit log’s bit flag is set to 0, indicating that the commit log no longer has to maintain that data for durability purposes. On reads, Cassandra will read both SSTables and memtables to find data values.
  • 72. Caching The key cache stores a map of partition keys to row index entries, facilitating faster read access into SSTables stored on disk. The key cache is stored on the JVM heap. The row cache caches entire rows and can greatly speed up read access for frequently accessed rows, at the cost of more memory usage. The row cache is stored in off- heap memory.
  • 74. Cassandra Cluster Manager Cassandra Cluster Manager or ccm is a set of Python scripts that allow you to run a multi- node cluster on a single machine. $ sudo pip3 install ccm $ sudo service ccm stop $ ccm create -v 3.0.0 -n 3 my_cluster --vnodes $ ccm list $ ccm start $ ccm status Cluster: 'my_cluster' --------------------- node1: UP node3: UP node2: UP
  • 75. Cassandra Cluster Manager This is equivalent to running the command nodetool status on the individual node.
  • 76. Cassandra Cluster Manager We can run the nodetool ring command in order to get a list of the tokens owned by each node.
  • 77. Adding a Nodes to a Cluster $ ccm add node4 -i 127.0.0.4 -j 7400 The tokens will be reallocated across all of the nodes.
  • 78. $ cd ~/.ccm; ls CURRENT my_cluster repository $ cd my_cluster; ls cluster.conf node1 node2 node3 $ cd ~/.ccm/my_cluster $ diff node1/conf/ node2/conf/ Cluster Configuration
  • 79. Seed Nodes A seed node is used as a contact point for other nodes, so Cassandra can learn the topology of the cluster—that is, what hosts have what ranges. For example, if node A acts as a seed for node C, when node C comes online, it will use node A as a reference point from which to get topology . This process is known as bootstrapping. Seed nodes do not auto bootstrap because it is assumed that they will be the first nodes in the cluster. A B C Cassandra.yaml in node1~node3 node1 - seeds: 127.0.0.1 node2 - seeds: 127.0.0.1,127.0.0.2 node3 - seeds: 127.0.0.1,127.0.0.2,127.0.0.3
  • 80. Snitches Snitches gather some information about your network topology so that Cassandra can efficiently route requests. • Simple Snitch: it unsuitable for multi-data center deployments. If you choose to use this snitch, you should also use the SimpleStrategy replication strategy for your keyspaces. • Property File Snitch: it uses information you provide about the topology of your cluster in a standard Java key/value properties file called cassandratopology.properties. • Gossiping Property File Snitch: The data exchanges information about its own rack and data cen‐ ter location with other nodes via gossip. The rack and data center locations are defined in the cassandra- rackdc.properties file.
  • 81. Snitches You configure the endpoint snitch implementation to use by updating the endpoint_snitch property in the cassandra.yaml file.
  • 82. Exercise 1. Using ccm to create a pseudo cassandra cluster with 3 nodes. The cassandra version of the nodes is set to 3.0.0 . The nodes use vnode to segment the tokens. 2. Before you starting up the cluster, configure the settings of each nodes. Use GossipingPropertyFile- Snitch to assign the datacenter and the rack of each node. 3. Stop the pseudo cluster. Configuring the setting of snitch to SimpleSnitch and restart the cluster. What's happening after you switching from GossipingPropertyFileSnitch to SimpleSnitch. Try to solve that error.
  • 83. Tokens and Virtual Nodes You configure the token numbers by updating the num_token property in the cassandra.yaml file. The value of num_token is configured to 1 and the result is shown in the figure bellow. Each node just holds a token.
  • 84. Network Interfaces Node ip • listen_address: the ip address of the node. • storage_port: designate the port used for inter-node communications, typically 7000. Thrift transport (Remote Procedure Call which will be removed entirely in a future release) • rpc_port: default 9160. • rpc_address: the ip address of the node. native transport (since cassandra 0.8) • start_native_transport: set it to true to enable native transport (the native transport handles the communication between client and server). • native_transport_port: designate the port used for native transport, typically 9042.
  • 85. Data Storage • commitlog_directory: the directory to store the commit logs. • data_file_directories: the directory to store SSTables. • disk_failure_policy, commit_failure_policy: set the failure response.
  • 91. libraryDependencies += "com.datastax.cassandra" % "cassandra-driver-core" % "3.5.1" libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.6.4" libraryDependencies += "org.apache.logging.log4j" % "log4j-core" % "2.11.1" Scala client build.sbt
  • 94.
  • 95.