Cassandra 2.1

4,542 views
4,178 views

Published on

Published in: Technology

Cassandra 2.1

  1. 1. ©2013 DataStax Confidential. Do not distribute without consent. CTO, DataStax Jonathan Ellis
 Project Chair, Apache Cassandra Cassandra 2.1 (mostly) 1
  2. 2. Five Years of Cassandra Jun-09 Mar-10 Jan-11 Nov-11 Sep-12 Jul-13 0.1 0.3 0.6 0.7 1.0 1.2 ... 2.0 DSE Jul-08
  3. 3. •Massively scalable •High performance •Reliable/Available Core values Cassandra HBase Redis MySQL
  4. 4. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); ! ! CREATE INDEX ON users(state); ! SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950; New Core Value •Massively scalable •High performance •Reliable/Available •Productivity + ease of use
  5. 5. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); Collections
  6. 6. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); ! SELECT * FROM users NATURAL JOIN users_addresses; Collections
  7. 7. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text ); ! SELECT * FROM users NATURAL JOIN users_addresses; X Collections
  8. 8. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); Collections
  9. 9. UPDATE users SET email_addresses = email_addresses + {‘jbellis@gmail.com’, ‘jbellis@datastax.com’}; CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text> ); Collections
  10. 10. Cassandra 2.0
  11. 11. Race condition SELECT name! FROM users! WHERE username = 'pmcfadin';
  12. 12. Race condition SELECT name! FROM users! WHERE username = 'pmcfadin'; (0 rows) SELECT name! FROM users! WHERE username = 'pmcfadin';
  13. 13. Race condition SELECT name! FROM users! WHERE username = 'pmcfadin'; (0 rows) SELECT name! FROM users! WHERE username = 'pmcfadin'; INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ba27e03fd9...',! '2011-06-20 13:50:00'); (0 rows)
  14. 14. Race condition SELECT name! FROM users! WHERE username = 'pmcfadin'; (0 rows) SELECT name! FROM users! WHERE username = 'pmcfadin'; INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ba27e03fd9...',! '2011-06-20 13:50:00'); (0 rows) INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ea24e13ad9...',! '2011-06-20 13:50:01');
  15. 15. Race condition This one wins SELECT name! FROM users! WHERE username = 'pmcfadin'; (0 rows) SELECT name! FROM users! WHERE username = 'pmcfadin'; INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ba27e03fd9...',! '2011-06-20 13:50:00'); (0 rows) INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ea24e13ad9...',! '2011-06-20 13:50:01');
  16. 16. Lightweight transactions
  17. 17. Lightweight transactions INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')! IF NOT EXISTS;
  18. 18. Lightweight transactions INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')! IF NOT EXISTS; INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ea24e13ad9...',! '2011-06-20 13:50:01')! IF NOT EXISTS;
  19. 19. Lightweight transactions INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')! IF NOT EXISTS; [applied]! -----------! True INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ea24e13ad9...',! '2011-06-20 13:50:01')! IF NOT EXISTS;
  20. 20. Lightweight transactions [applied] | username | created_date | name ! -----------+----------+----------------+----------------! False | pmcfadin | 2011-06-20 ... | Patrick McFadin INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ba27e03fd9...',! '2011-06-20 13:50:00')! IF NOT EXISTS; [applied]! -----------! True INSERT INTO users ! (username, name, email,! password, created_date)! VALUES ('pmcfadin',! 'Patrick McFadin',! ['patrick@datastax.com'],! 'ea24e13ad9...',! '2011-06-20 13:50:01')! IF NOT EXISTS;
  21. 21. Atomic log appends with LWT CREATE TABLE log (! log_name text,! seq int static,! logged_at timeuuid,! entry text,! primary key (log_name, logged_at)! );! ! INSERT INTO log (log_name, seq) ! VALUES ('foo', 0);
  22. 22. Atomic log appends with LWT BEGIN BATCH! ! UPDATE log SET seq = 1! WHERE log_name = 'foo'! IF seq = 0;! ! INSERT INTO log (log_name, logged_at, entry)! VALUES ('foo', now(), 'test');! ! APPLY BATCH;!
  23. 23. Details •http://www.datastax.com/dev/blog/lightweight-transactions-in- cassandra-2-0 •Paxos state is durable + quorum based •Paxos made Simple •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •4 round trips vs 1 for normal updates •http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency- hopeful-consistency-by-christos-kalantzis
  24. 24. Reads in a cluster Client Coordinator 40%
 busy 90% busy 30% busy
  25. 25. Reads in a cluster Client Coordinator 40%
 busy 90% busy 30% busy
  26. 26. Reads in a cluster Client Coordinator 40%
 busy 90% busy 30% busy
  27. 27. Reads in a cluster Client Coordinator 40%
 busy 90% busy 30% busy
  28. 28. Reads in a cluster Client Coordinator 40%
 busy 90% busy 30% busy
  29. 29. A failure Client Coordinator 40%
 busy 90% busy 30% busy
  30. 30. A failure Client Coordinator 40%
 busy 90% busy 30% busy
  31. 31. A failure Client Coordinator 40%
 busy 90% busy 30% busy
  32. 32. A failure Client Coordinator 40%
 busy 90% busy 30% busy X
  33. 33. A failure Client Coordinator 40%
 busy 90% busy 30% busy Xtimeout
  34. 34. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy
  35. 35. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy
  36. 36. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy
  37. 37. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy X
  38. 38. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy X
  39. 39. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy X
  40. 40. Rapid read protection Client Coordinator 40%
 busy 90% busy 30% busy Xsuccess
  41. 41. Rapid Read Protection NONE
  42. 42. Latency (mid-compaction)
  43. 43. Cold data 10,000 req/s 5,000 req/s 4,000 req/s 10 req/s
  44. 44. Cold data 10,000 req/s 5,000 req/s 4,000 req/s 10 req/s
  45. 45. Cold data compaction 10 req/s 10,000 req/s
  46. 46. Cassandra 2.1
  47. 47. User defined types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) ! CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) ! SELECT id, name, addresses.city, addresses.phones FROM users; ! id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------- 63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
  48. 48. Collection indexing CREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text> ); ! CREATE INDEX song_tags_idx ON songs(tags); ! SELECT * FROM songs WHERE tags CONTAINS 'blues'; ! id | album | artist | tags | title ----------+---------------+-------------------+-----------------------+------------------ 5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind ! !
  49. 49. (UDT indexing?)
  50. 50. Counters++
  51. 51. Counters++ •simpler implementation, no more edge cases
  52. 52. Counters++ •simpler implementation, no more edge cases •possible to properly repair now
  53. 53. Counters++ •simpler implementation, no more edge cases •possible to properly repair now •significantly less garbage and internode traffic generated
  54. 54. Counters++ •simpler implementation, no more edge cases •possible to properly repair now •significantly less garbage and internode traffic generated •better performance for 99% of uses
  55. 55. Counters++ •simpler implementation, no more edge cases •possible to properly repair now •significantly less garbage and internode traffic generated •better performance for 99% of uses •RF>1, replicate_on_write=true
  56. 56. Counters++ •simpler implementation, no more edge cases •possible to properly repair now •significantly less garbage and internode traffic generated •better performance for 99% of uses •RF>1, replicate_on_write=true •topology changes not leading to data loss (#4071)
  57. 57. Counters++ •simpler implementation, no more edge cases •possible to properly repair now •significantly less garbage and internode traffic generated •better performance for 99% of uses •RF>1, replicate_on_write=true •topology changes not leading to data loss (#4071) •commitlog now 100% safe to replay (#4417)
  58. 58. Counters++ •simpler implementation, no more edge cases •possible to properly repair now •significantly less garbage and internode traffic generated •better performance for 99% of uses •RF>1, replicate_on_write=true •topology changes not leading to data loss (#4071) •commitlog now 100% safe to replay (#4417) •Internal format overhaul still coming in 3.0 (#6506)
  59. 59. What hasn’t changed
  60. 60. What hasn’t changed •same API
  61. 61. What hasn’t changed •same API •same average throughput
  62. 62. What hasn’t changed •same API •same average throughput •same restrictions on mixing counter and non-counter columns
  63. 63. What hasn’t changed •same API •same average throughput •same restrictions on mixing counter and non-counter columns •same restrictions on mixing counter and non-counter updates
  64. 64. What hasn’t changed •same API •same average throughput •same restrictions on mixing counter and non-counter columns •same restrictions on mixing counter and non-counter updates •same restrictions on counter deletes
  65. 65. What hasn’t changed •same API •same average throughput •same restrictions on mixing counter and non-counter columns •same restrictions on mixing counter and non-counter updates •same restrictions on counter deletes •same retry limitations
  66. 66. Writes (low contention)
  67. 67. Writes (high contention)
  68. 68. Data directories (2.0) /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-CompressionInfo.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Data.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Filter.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Index.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Statistics.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-Summary.db /var/lib/cassandra/data/foo/bar/foo-bar-jb-1-TOC.txt
  69. 69. Data directories (2.1) /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4 /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-CompressionInfo.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Data.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Filter.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Index.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Statistics.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-Summary.db /var/lib/cassandra/flush/foo/bar-2fbb89709a6911e3b7dc4d7d4e3ca4b4/foo-bar-ka-1-TOC.txt
  70. 70. Inefficient bloom filters + = ?
  71. 71. + = Inefficient bloom filters
  72. 72. + = Inefficient bloom filters
  73. 73. Inefficient bloom filters
  74. 74. HyperLogLog applied
  75. 75. HLL and compaction
  76. 76. HLL and compaction
  77. 77. HLL and compaction
  78. 78. More-efficient repair
  79. 79. More-efficient repair
  80. 80. More-efficient repair
  81. 81. More-efficient repair
  82. 82. More-efficient repair
  83. 83. More-efficient repair
  84. 84. More-efficient repair
  85. 85. More-efficient repair
  86. 86. More-efficient repair
  87. 87. Implications for LCS (and STCS)
  88. 88. The new query cache
  89. 89. The new row cache CREATE TABLE notifications ( target_user text, notification_id timeuuid, source_id uuid, source_type text, activity text, PRIMARY KEY (target_user, notification_id) ) WITH CLUSTERING ORDER BY (notification_id DESC) AND caching = 'rows_only' AND rows_per_partition_to_cache = '3'; !
  90. 90. The new row cache target_user notification_id source_id source_type activity nick e1bd2bcb- d972b679- photo jbellis liked nick 321998c- d972b679- photo rbranson commented nick ea1c5d35- 88a049d5- user mbulman created account nick 5321998c- 64613f27- photo jbellis commented nick 07581439- 076eab7e- user thobbs created account rbranson 1c34467a- f04e309f- user jbellis created account
  91. 91. The new row cache target_user notification_id source_id source_type activity nick e1bd2bcb- d972b679- photo jbellis liked nick 321998c- d972b679- photo rbranson commented nick ea1c5d35- 88a049d5- user mbulman created account nick 5321998c- 64613f27- photo jbellis commented nick 07581439- 076eab7e- user thobbs created account rbranson 1c34467a- f04e309f- user jbellis created account
  92. 92. Read performance
  93. 93. Reads post-compaction
  94. 94. Questions?

×