An introduction to Cassandra given for the "Scala by the Lagoon", the Venice area Scala user group.
How the database is designed and how it works, the CAP theorem and its implications on distributed databases.
Cassandra query language first look and a primer on Phantom, a Scala DSL for connecting to a Cassandra cluster.
5. What’s Cassandra
• NoSQL
• High availability
• Linear Scalability
• Predictable Performance
• No Single Point of Failure (peer to peer)
• Multi DataCenter (DC)
• A hybrid between a key-value and a column oriented
• Clients see data in tables
• Data manipulation through CQL (similar to SQL)
• Not a replacement for a RDBMS
“Apache Cassandra is an open
source distributed database
management system designed
to handle large amounts of data
across many servers, providing
high availability with no single
point of failure.”
7. What’s Cassandra
• Internet of Things: time series from users,
device and sensors
• Personalization: ingest and analyze to
personalize user experience
• Messaging: storing, managing, and
analyzing messages
• Fraud detection: to analyze quickly patterns
• Playlist: storing collections of user selected
items
Use cases
8. What’s Cassandra
• Developed at Facebook for messaging
system
• From Google Big Table: the storage model
• From Amazon Dynamo: the distribution
backbone
• Open source since 2008
• Apache top level since 2010
• Most stable release: 2.2 (2015)
• Latest release 3.4 (March 2016)
• Tick-tock releases in 2016
Quick history so far
9. • No master/slave, no Zookeeper, no config server
• All nodes can reply to writes and reads
• Data is partitioned around the ring
• Location of data on ring is determined a function
• Data is replicated with a specific factor
What’s Cassandra
1
4 2
3
-128
-64
0
64
C* is a big hash ring
11. Three properties
of shared data
system operating
over a network
The CAP Theorem
Consistency
every read would get you the most recent write
Availability
every node (if not failed) always executes queries within
unbounded time (but finite)
Partition Tolerance
the system continues to operate despite arbitrary
partitioning due to network failures
12. The CAP Theorem (rough)
1.AP: the system is Partitioned and
Available to clients but not
Consistent
2.CP: the system is Consistent and
working during a Partition, but
there is no Availability
3.AC: the system is Consistent and
Available there is no Partition.
Does this system really exist?
“pick two”
“this is often considered to be a misleading
presentation, since you cannot build - or
choose - 'partition tolerance': your system
either might experience partitions or it won't”.
13. The CAP Theorem (finer)
1. Do you respond to the reads with
potentially stale information
2. Do you wait (potentially forever) to
hear from the other side of the
partition and compromise availability?
“pick one”
or what you give up?
If a client writes to one side of a partition, any
reads that go to the other side of that partition
can't possibly know about the most recent
write. We are in a critical condition
14. Consistency & Availability Tradeoff
Spectrum
Both C and A are strong
guarantees in a critical condition
But out of critical conditions, a
system can be happily consistent
and available
AP to CP is a spectrum and
the whole space is useful
Eventual consistency and high
availability are good enough
AVAILABILITY CONSISTENCY
AP CP
15. Beyond CAP: PACELC
P(A/C)E(L/C)
AVAILABILITY CONSISTENCY
AP CP
(LESS) LATENCY
CONSISTENCY
If there is a partition (P)
how does the system trade off
availability and consistency (A and C)
else (E)
when the system is running normally
in the absence of partitions, how does
the system trade off latency (L) and
consistency (C)?
16. Cassandra & PACELC
C* has a tunable
consistency
Can move from PC/EC to PA/EL
Consistency can be tuned at query
level from max availability to
immediate consistency
AVAILABILITY CONSISTENCY
AP CP
(LESS) LATENCY
CONSISTENCY
PA/EL
PC/EC
18. Internals
A cluster is a peer to peer set of servers
Organized in
• Data centers
• Racks
• Nodes
Cluster
19. Internals
A node needs
• Cluster name
• Seeds addresses
• Its listen address
Every node is aware of status of
cluster through gossip
Joining a
cluster
20. Internals
Replication Tables are grouped by keyspace
Every keyspace has its own replication properties
CREATE KEYSPACE books
WITH REPLICATION =
{'class': 'SimpleStrategy',
'replication_factor' : 3};
CREATE KEYSPACE books
WITH REPLICATION =
{'class': 'NetworkTopologyStrategy',
'dc-newyork': 3,
'dc-london': 4};
21. Internals
Request coordinator
SELECT * FROM authors
Where name = ‘Poe’
• The node that handles communication with client
• Every node can do the coordinator
• The driver decides which is the coordinator
22. Internals
Consistent
hashing
CREATE TABLE authors(
name text,
bio text ,
PRIMARY KEY (name));
INSERT INTO authors
(name, bio)
VALUES
('Poe', ’Poe was a writer');
• -263..263 tokens (3.4*1038) distributed in cluster
• Every node knows its tokens
• Partitioner is a function from a partition key to a token
• Nodes and drivers use the same partitioner
23. Internals
Replication
Poe, Poe was a writer
Poe, Poe was a
writer
CREATE TABLE authors(
name text,
bio text ,
PRIMARY KEY (name));
INSERT INTO authors
(name, bio)
VALUES
('Poe', ’Poe was a writer');
24. Internals
Hinted
handoff
Poe, Poe was an
American writer
Poe, Poe was an
American writer
• What if node 1 is down? Do we give up writing?
• Hinted handoff allows to replay writes to a momentary
unavailable node
• It’s optional
• There are some limitations
• Coordinator keeps the data
• What about consistency?
UPDATE authors
SET bio = ‘Poe was an
American writer’
WHERE name = ‘Poe’
25. Internals
Consistency
level
CONSISTENCY = TWO;
SELECT * FROM authors
WHERE name= 'Poe’;
Real read will happen for one node only, the other two will send a digest
Poe was an
American
writer
Poe was
an
American
writer
Poe was a
writer
• How many nodes must acknowledge a request to return a
positive response to the client (Read/Write)
• Set at query level
• What if replicas do not agree?
26. Internals
Consistency
levels
there are more
• ANY (writes only)
• ONE
• TWO
• THREE
• QUORUM
• LOCAL_ONE
• LOCAL_QUORUM
ALL for immediate consistency (high latency)
ONE/ANY for eventual consistency (lowest latency)
27. Internals
Immediate
consistency
• Combine read and write to get consistency as this
rule:
If (nodes_written + nodes_read) > RF
then immediate consistency
• Examples with RF = 3
• Write: ALL, Read: ONE
• Write: QUORUM, Read: QUORUM
• Write: ONE, Read: ALL
• Is stale data really a problem?
29. CQL
• CQL is declarative like SQL and the
very basic structure of the query
component of the language
(SELECTs, INSERTs, …) is the same.
• But there are enough differences that
one should not approach using it in the
same way as conventional SQL
(because Cassandra is not relational!)
Cassandra
Query
Language
30. CQL
Cassandra
Query
Language
CREATE TABLE authors(
name text,
gender text,
born int,
died int,
bio text,
genre set<text>,
PRIMARY KEY (name));
ALTER TABLE authors ADD picture blob;
ALTER TABLE authors DROP gender;
DROP TABLE authors;
31. CQL
Cassandra
Query
Language
Cassandra has upserts, what
happens if I want to update the
name of an author?
SELECT * FROM authors LIMIT 10;
SELECT * FROM authors
where NAME = 'Herman Melville';
INSERT INTO authors(name)
VALUES ('Edgar Alan Poe');
UPDATE authors SET born = 1797
WHERE name = 'Mary Shelley';
UPDATE authors SET genre = genre + {'Gothic
literature'} where name = 'Mary Shelley';
DELETE from authors
WHERE name = 'Charles Dickens';
32. CQL
Invalid queries
• SELECT * FROM authors WHERE born = 1797;
• SELECT * FROM authors WHERE name like
'Mar%';
• SELECT * FROM authors WHERE NOT name = 'Mary
Shelley';
• SELECT * FROM authors WHERE name = 'Mary
Shelley' or name = 'Edgar Alan Poe';
• SELECT max(born) FROM authors group by
genre;
• SELECT * FROM authors ORDER by name;
• SELECT * FROM authors JOIN books ON
author.name = books.author;
33. CQL
How to solve
the problem?
Create tables that responds to
specific queries
Logged batch to the rescue
Actually won’t work for two
authors of 1819, try to insert
(George Eliot, 1819)
CREATE TABLE authors_by_year (
born int,
name text,
PRIMARY KEY (born));
INSERT INTO authors_by_year (born, name)
VALUES (1797, 'Mary Shelley')
INSERT INTO authors_by_year (born, name)
VALUES (1819, 'Herman Melville')
SELECT * FROM authors_by_year
WHERE born = 1797;
34. CQL
Partition key
Primary key
Partition
Row
• Partition is a list of rows and has a Partition
key, simple or compound
• Every row has its primary key composed by
partition key and (eventual) other columns
(said clustering) and it’s ordered inside
partition
• Partition key and primary key, something like
a two level key
• Partition key is related to token so a partition
cannot be split between nodes
35. CQL
Partition
example
CREATE TABLE books(
author text,
born int static,
title text,
published int,
PRIMARY KEY ((author), title);
INSERT INTO books(author, title) VALUES
('Melville', 'White-Jacket');
UPDATE books SET born = 1819 WHERE author =
'Melville’;
INSERT INTO books(author, title, published)
VALUES ('Melville', 'Moby Dick' , 1851);
36. CQL
Partition
example
SELECT * FROM books WHERE author = 'Melville'
author | title | born | published
----------+--------------+------+-----------
Melville | Moby Dick | 1819 | 1851
Melville | White-Jacket | 1819 | null
37. CQL
Filtering and
ordering is
possible on
clustered
columns
SELECT * FROM books WHERE author = 'Melville'
ORDER BY title DESC;
author | title | born | published
----------+--------------+------+-----------
Melville | White-Jacket | 1819 | null
Melville | Moby Dick | 1819 | 1851
SELECT * FROM books WHERE author = 'Melville'
AND title >= 'O';
author | title | born | published
----------+--------------+------+-----------
Melville | White-Jacket | 1819 | null
38. CQL
Filtering and
ordering is
possible on
clustered
columns
SELECT * FROM books WHERE title >= 'O';
InvalidRequest: code=2200 [Invalid query]
message="Cannot execute this query as it might
involve data filtering and thus may have
unpredictable performance. If you want to
execute this query despite the performance
unpredictability, use ALLOW FILTERING”
SELECT * FROM books WHERE title >= 'O’ ALLOW
FILTERING;
author | title | born | published
----------+--------------+------+-----------
Melville | White-Jacket | 1819 | null
42. PHANTOM
Books
class BooksTable extends CassandraTable[BooksDAO, Book] {
override val tableName = "books"
object author extends StringColumn(this) with PartitionKey[String]
object born extends IntColumn(this) with StaticColumn[Int]
object title extends StringColumn(this) with ClusteringOrder[String]
object published extends OptionalIntColumn(this)
override def fromRow(row: Row): Book = {
Book(
author = author(row),
born = born(row),
title = title(row),
published = published(row)
)
}
}