Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

2,072
-1

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,072
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
51
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

  1. 1. Modern Apache Cassandra Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax ©2013 DataStax Confidential. Do not distribute without consent. 1
  2. 2. Five years of Cassandra 0.1 Jul-08 ... 0.3 Jul-09 0.6 May-10 0.7 Feb-11 1.0 Dec-11 DSE 1.2 Oct-12 2.0 Jul-13
  3. 3. Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support ACE
  4. 4. Time series data
  5. 5. Multi-datacenter support
  6. 6. Distributed counters
  7. 7. Hadoop support
  8. 8. Application/Use Case • Adobe AudienceManager: web analytics, content management, and online advertising Why Cassandra? • Low-latency • Scalable • Multi-datacenter • Tuneable consistency ACE
  9. 9. Bootstrapping
  10. 10. Bootstrapping
  11. 11. s d Bootstrapping s d s d s d
  12. 12. s d Bootstrapping s d s d s d
  13. 13. Bootstrapping
  14. 14. Tuneable consistency •(We’ll come back to this)
  15. 15. Application/Use Case • Logging • Notifications Why Cassandra? • Efficient writes • Durable • Scalable • High availability ACE
  16. 16. Durable + efficient writes write( k1 ,c1:v1 ) Memory Memtable Commit log Hard drive
  17. 17. write(k1 ,c1:v Memory k1 c1:v Memtable k1 c1:v Commit log Hard drive
  18. 18. write(k1 ,c2:v k1 c1:v c2:v Memory k1 c1:v k1 c2:v Hard drive
  19. 19. write(k2 ,c1:v c2:v ) k1 c1:v c2:v Memory k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v Hard drive
  20. 20. write(k1 ,c1:v c3:v ) k1 c1:v c2:v c3:v Memory k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v k1 c1:v c3:v Hard drive
  21. 21. Memory flush index / BF cleanup k1 c1:v c2:v c3:v k2 c1:v c2:v SSTable Hard drive
  22. 22. High availability •99.9999% availability on Cassandra •(We’ll come back to this, too)
  23. 23. Core values •Massive scalability •High performance •Ease of use •Reliability/Availabilty Cassandra MySQL HBase Redis
  24. 24. VLDB benchmark (RWS) THROUGHPUT OPS/SEC) 80000 Cassandra MySQL HBase Redis C SS A RA ND A 60000 40000 20000 0 0 2 4 6 NUMBER OF NODES 8 10 12
  25. 25. Endpoint benchmark (RW) HBase MongoDB AN DR A Cassandra CA THROUGHPUT OPS/SEC) SS 35000 26250 17500 8750 0 1 2 4 8 NUMBER OF NODES 16 32
  26. 26. Ease of use CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
  27. 27. Classic partitioning (SPOF) partition 1 partition 2 partition 3 partition 4 router client
  28. 28. (Not a theoretical problem) https://speakerdeck.com/mitsuhiko/a-year-of-mongodb http://aphyr.com/posts/288-the-network-is-reliable
  29. 29. Fully distributed, no SPOF Client p3 p6 p1 p1 p1
  30. 30. Partitioning Primary key determines placement* jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  31. 31. PK Murmur Hash jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Murmur* hash operation yields a 64-bit number for keys of any size.
  32. 32. The “token ring” Node A Node B Node D Node C
  33. 33. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  34. 34. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  35. 35. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  36. 36. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  37. 37. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  38. 38. Replication Node A Node D carol a9a0198010... Node B Node C
  39. 39. Node A Node D carol a9a0198010... Node B Node C
  40. 40. Node A Node D carol a9a0198010... Node B Node C
  41. 41. Virtual nodes Node A Node B C’’ B A’’ C D’ Node D Node C Without vnodes B’ C’ A A’ D With vnodes
  42. 42. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  43. 43. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  44. 44. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  45. 45. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  46. 46. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  47. 47. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
  48. 48. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
  49. 49. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
  50. 50. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  51. 51. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  52. 52. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  53. 53. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  54. 54. Rapid Read Protection NONE
  55. 55. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  56. 56. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  57. 57. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  58. 58. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  59. 59. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  60. 60. Consistency levels •ONE •QUORUM •LOCAL_QUORUM •LOCAL_ONE •TWO •ALL
  61. 61. Race condition SELECT name FROM users WHERE username = 'pmcfadin'; #CASSANDRAEU
  62. 62. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin';
  63. 63. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows)
  64. 64. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01');
  65. 65. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) This one wins INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01');
  66. 66. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; #CASSANDRAEU
  67. 67. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True #CASSANDRAEU INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS;
  68. 68. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True #CASSANDRAEU INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS; [applied] | username | created_date | name -----------+----------+----------------+---------------False | pmcfadin | 2011-06-20 ... | Patrick McFadin
  69. 69. Paxos •All operations are quorum-based •Each replica sends information about unfinished operations to the leader during prepare •Paxos made Simple
  70. 70. Details •4 round trips vs 1 for normal updates •Paxos state is durable •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •http://www.datastax.com/dev/blog/lightweighttransactions-in-cassandra-2-0
  71. 71. Use with caution •Great for 1% of your application •Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013- eventual-consistency-hopeful-consistency-by-christos-kalantzis
  72. 72. Cassandra 2.1
  73. 73. User defined types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) SELECT id, name, addresses.city, addresses.phones FROM users; id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
  74. 74. Collection indexing CREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text> ); CREATE INDEX song_tags_idx ON songs(tags); SELECT * FROM songs WHERE 'blues' IN tags; id | album | artist | tags | title ----------+---------------+-------------------+-----------------------+-----------------5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind
  75. 75. More-efficient repair
  76. 76. More-efficient repair
  77. 77. More-efficient repair
  78. 78. More-efficient repair
  79. 79. More-efficient repair
  80. 80. More-efficient repair
  81. 81. More-efficient repair
  82. 82. More-efficient repair
  83. 83. More-efficient repair
  84. 84. 2.1 roadmap •Efficient handling of cold data •Counters 2.0 •Only repair new-since-last-repair data •January/February 2014
  85. 85. Вопросы?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×