Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra lesson learned - extended

115 views

Published on

Extended version

Published in: Software
  • Be the first to comment

  • Be the first to like this

Cassandra lesson learned - extended

  1. 1. Cassandra - lesson learned Andrzej Ludwikowski
  2. 2. About me? - www.ludwikowski.info - github.com/aludwiko - @aludwikowski -
  3. 3. Why Cassandra? - BigData!!! - Volume (petabytes of data, trillions of entities) - Velocity (real-time, streams, millions of transactions per second) - Variety (un-, semi-, structured) - writes are cheap, reads are ??? - near-linear horizontal scaling (in a proper use cases) - fully distributed, with no single point of failure - data replication by default
  4. 4. Cassandra vs CAP? - CAP Theorem - pick two
  5. 5. Cassandra vs CAP? - CAP Theorem - pick two
  6. 6. Cassandra vs CAP? - CAP Theorem - pick two
  7. 7. Origins? 2010
  8. 8. Name?
  9. 9. Name?
  10. 10. Write path Node 1 Node 2 Node 3 Node 4 Client (driver)
  11. 11. Write path Node 1 Node 2 Node 3 Node 4 Client (driver) - Any node can coordinate any request (NSPOF)
  12. 12. - Any node can coordinate any request (NSPOF) - Replication Factor Write path Node 1 Node 2 Node 3 Node 4 Client RF=3
  13. 13. - Any node can coordinate any request (NSPOF) - Replication Factor - Consistency Level Write path Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=2
  14. 14. - Token ring from -2^63 to 2^64 Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 0100
  15. 15. - Token ring from -2^63 to 2^64 - Partitioner: partition key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  16. 16. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77
  17. 17. - Token ring from -2^63 to 2^64 - Partitioner: primary key -> token Write path - consistent hashing Node 1 Node 2 Node 3 Node 4 Client Partitioner 0-25 25-50 51-75 76-100 77 77 77
  18. 18. DEMO
  19. 19. Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  20. 20. - Hinted handoff Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  21. 21. - Hinted handoff Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  22. 22. - Hinted handoff - Retry idempotent inserts - build-in policies Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  23. 23. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  24. 24. - Hinted handoff - Retry idempotent inserts - build-in policies - Lightweight transactions (Paxos) - Batches Write path - problems? Node 1 Node 2 Node 3 Node 4 Client 0-25 77 25-50 51-75 76-100 77 77
  25. 25. Write path - node level
  26. 26. Write path - why so fast? - Commit log - append only
  27. 27. Write path - why so fast?
  28. 28. Write path - why so fast? 50,000 t/s 50 t/ms 5 t/100us 1 t/20us
  29. 29. Write path - why so fast? - Commit log - append only - Periodic (10s) or batch sync to disk Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  30. 30. D asdd R ack 2 R ack 1 Write path - why so fast? - Commit log - append only - Periodic or batch sync to disk - Network topology aware Node 1 Node 2 Node 3 Node 4 Client RF=2 CL=2
  31. 31. Write path - why so fast? Client - Commit log - append only - Periodic or batch sync to disk - Network topology aware Asia DC Europe DC
  32. 32. - Most recent win - Eager retries - In-memory - MemTable - Row Cache - Bloom Filters - Key Caches - Partition Summaries - On disk - Partition Indexes - SSTables Node 1 Node 2 Node 3 Node 4 Client RF=3 CL=3 Read path timestamp 67 timestamp 99 timestamp 88
  33. 33. Immediate vs. Eventual Consistency - if (writeCL + readCL) > replication_factor then immediate consistency - writeCL=ALL, readCL=1 - writeCL=1, readCL=ALL - writeCL,readCL=QUORUM - https://www.ecyrd.com/cassandracalculator/ Node 1 Node 2 Node 3 Node 4 Client RF=3
  34. 34. Modeling - new mindset - QDD, Query Driven Development - Nesting is ok - Duplication is ok - Writes are cheap no joins
  35. 35. QDD - Conceptual model - Technology independent - Chen notation
  36. 36. QDD - Application workflow
  37. 37. QDD - Logical model - Chebotko diagram
  38. 38. QDD - Physical model - Technology dependent - Analysis and validation (finding problems) - Physical optimization (fixing problems) - Data types
  39. 39. Physical storage - Primary key - Partition key CREATE TABLE videos ( id int, title text, runtime int, year int, PRIMARY KEY (id) ); id | title | runtime | year ----+---------------------+---------+------ 1 | dzien swira | 93 | 2002 2 | chlopaki nie placza | 96 | 2000 3 | psy | 104 | 1992 4 | psy 2 | 96 | 1994 1 title runtime year dzien swira 93 2002 2 title runtime year chlopaki... 96 2000 3 title runtime year psy 104 1992 4 title runtime year psy 2 96 1994 SELECT FROM videos WHERE title = ‘dzien swira’
  40. 40. Physical storage CREATE TABLE videos_with_clustering ( title text, runtime int, year int, PRIMARY KEY ((title), year) ); - Primary key (could be compound) - Partition key - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla 1954 runtime 98 1998 runtime 140 2014 runtime 123 1992 runtime 104 psy SELECT FROM videos_with_clustering WHERE title = ‘godzilla’; SELECT FROM videos_with_clustering WHERE title = ‘godzilla’ AND year > 1998;
  41. 41. Physical storage CREATE TABLE videos_with_composite_pk( title text, runtime int, year int, PRIMARY KEY ((title, year)) ); - Primary key (could be compound) - Partition key (could be composite) - Clustering column (order, uniqueness) title | year | runtime -------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104 godzilla:1954 runtime 93 godzilla:1998 runtime 140 godzilla:2014 runtime 123 psy:1992 runtime 104 SELECT FROM videos_with_composite_pk WHERE title = ‘godzilla’ AND year = 1954
  42. 42. Modeling - clustering column(s) Q: Retrieve videos an actor has appeared in (newest first).
  43. 43. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve videos an actor has appeared in (newest first).
  44. 44. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  45. 45. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  46. 46. Modeling - clustering column(s) CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id, character_name) ) WITH CLUSTERING ORDER BY (added_date desc); Q: Retrieve videos an actor has appeared in (newest first).
  47. 47. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ( ) ) WITH CLUSTERING ORDER BY ( ); Q: Retrieve last 1000 measurement from given day.
  48. 48. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day.
  49. 49. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day. 1 day = 86 400 rows 1 week = 604 800 rows 1 month = 2 592 000 rows 1 year = 31 536 000 rows
  50. 50. Modeling - compound partition key CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Q: Retrieve last 1000 measurement from given day.
  51. 51. Modeling - TTL CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time) ) WITH CLUSTERING ORDER BY (event_time desc); Retention policy - keep data only from last week. INSERT INTO temperature_by_day … USING TTL 604800;
  52. 52. Modeling - bit map index CREATE TABLE car ( year timestamp, model text, color timestamp, vehicle_id int, //other columns PRIMARY KEY ((year, model, color), vehicle_id) ); Q: Find car by year and/or model and/or color. INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...); INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...); SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;
  53. 53. Modeling - wide rows CREATE TABLE user ( email text, name text, age int, PRIMARY KEY (email) ); Q: Find user by email.
  54. 54. Modeling - wide rows CREATE TABLE user ( domain text, user text, name text, age int, PRIMARY KEY ((domain), user) ); Q: Find user by email.
  55. 55. Modeling - versioning with lightweight transactions CREATE TABLE document ( id text, content text, version int, locked_by text, PRIMARY KEY ((id)) ); INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1) IF NOT EXISTS; UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null; UPDATE document SET content = 'better content', version = 2, locked_by = null WHERE id = 'my doc' IF locked_by = 'andrzej';
  56. 56. Modeling - JSON with UDT and tuples { "title": "Example Schema", "type": "object", "properties": { "firstName": “andrzej”, "lastName": “ludwikowski”, "age": { "description": "Age in years", "type": "integer", "minimum": 0 } }, “x_dimension”: “1”, “y_dimension”: “2”, } CREATE TYPE age ( description text, type int, minimum int ); CREATE TYPE prop ( firstName text, lastName text, age frozen <age> ); CREATE TABLE json ( title text, type text, properties list<frozen <prop>>, dimensions tuple<int, int> PRIMARY KEY (title) );
  57. 57. Common use cases - Sensor data (Zonar) - Fraud detection (Barracuda) - Playlist and collections (Spotify) - Personalization and recommendation engines (Ebay) - Messaging (Instagram) - Event Sourcing!
  58. 58. Common anti use cases - Queue - Search engine
  59. 59. Tombstones - Understanding Cassandra tombstones
  60. 60. Datastax Academy - Introduction to Apache Cassandra - Data Modeling - DataStax Enterprise Foundations of Apache Cassandra - DataStax Enterprise Operations with Apache Cassandra - DataStax Enterprise Search - DataStax Enterprise Analytics with Apache Spark - DataStax Enterprise Graph
  61. 61. Competition? ScyllaDB - Cassandra without JVM - same protocol, SSTable compatibility - C++ and Seastar lib - 1,000,000 IOPS
  62. 62. Not covered - schema migrations - backups - DSE
  63. 63. About me? - www.ludwikowski.info - github.com/aludwiko - @aludwikowski -

×