Successfully reported this slideshow.
Your SlideShare is downloading. ×

Cassandra sharding and consistency

Ad

Apache Cassandra:
Sharding and Consistency

Ad

€ whoami
● Federico Razzoli
● Freelance consultant
● Writing SQL since MySQL 2.23
hello@federico-razzoli.com
● I love open...

Ad

What Cassandra is for
● Get data by a Partitioning Key, ordered by a Clustering Key (primary key)
SELECT * FROM conf
WHERE...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Upcoming SlideShare
FILE TRANSFER PROTOCOL (FTP)
FILE TRANSFER PROTOCOL (FTP)
Loading in …3
×

Check these out next

1 of 10 Ad
1 of 10 Ad

Cassandra sharding and consistency

Download to read offline

If you are only familiar with relational databases, Cassandra can be confusing. It is designed to shard, and it guarantees consistency in an interesting (and frustrating) way.

If you are only familiar with relational databases, Cassandra can be confusing. It is designed to shard, and it guarantees consistency in an interesting (and frustrating) way.

Advertisement
Advertisement

More Related Content

Advertisement

Cassandra sharding and consistency

  1. 1. Apache Cassandra: Sharding and Consistency
  2. 2. € whoami ● Federico Razzoli ● Freelance consultant ● Writing SQL since MySQL 2.23 hello@federico-razzoli.com ● I love open source, sharing, Collaboration, win-win, etc ● I love MariaDB, MySQL, Postgres, etc ○ Even Db2, somehow
  3. 3. What Cassandra is for ● Get data by a Partitioning Key, ordered by a Clustering Key (primary key) SELECT * FROM conf WHERE year = 2019 ORDER BY attendees ● Simple aggregations SELECT COUNT(*) FROM conf WHERE year = 2019 ● No JOINs ● No filtering by other columns (by default) ● Different queries on same data? Multiple tables with different primary keys ● Data is eventually mostly consistent
  4. 4. CREATE TABLE -- a database by any other name would contain as sweet tables CREATE KEYSPACE stats WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', -- london dc has 2 copies of each row on different nodes, -- paris dc has 3 'datacenter_london': 2, 'datacenter_paris': 3 }; CREATE TABLE stats.conf ( name TEXT, year SMALLINT, attendees SMALLINT, -- rows with same year are in the same nodes, -- ordered by attendees PRIMARY KEY (year, attendees) ) WITH CLUSTERING ORDER BY (attendees DESC) ;
  5. 5. Cassandra Cluster ● Clients can read from / write to any node ● The contacted node acts as a coordinator ● The Coord. Computes a hash of the Partitioning Key, which determines which node(s) own the rows ● The coordinator sends / requests data to the closest owner(s) ● Replication is asynchronous
  6. 6. Consitency levels ● Consistency vs Speed ● Read / Write ● Default consistency level is ONE: ○ You write to a node and don’t wait for the change to be replicated ○ You read from one node possibly stale data ● TWO, THREE, QUORUM ○ Writes and results must be validated by 2, 3 or the majority of nodes ● With multiple datacenters, any of these levels can be very expensive
  7. 7. Consitency levels: local and paranoid ● Faster: ○ LOCAL_ONE ○ LOCAL_QUORUM ○ If connection between datacentres breaks, data will be stale ● More paranoid: ○ EACH_QUORUM ○ ALL ○ Avoid inconsistencies in any dc
  8. 8. Cost of strictness ● Stricter consistency levels mean: ○ More communications between nodes, higher latency ○ Query may fail if nodes crash or connection loss
  9. 9. Interesting inconsistencies ● 2 UPDATEs on the same row/column? ○ The latest wins ○ Who decides which is latest? Sometimes, network/node latency :) ● Rows are DELETEd on a the node and another is crashed? ○ That node may not delete those rows ● You DELETE a row and immediately after I UPDATE it on the same node? ○ Row may not be DELETEd One second left, no time to explain :D
  10. 10. Thank you kindly! https://federico-razzoli.com info@federico-razzoli.com

×