Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

  • 1,604 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,604
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
46
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Modern Apache Cassandra Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax ©2013 DataStax Confidential. Do not distribute without consent. 1
  • 2. Five years of Cassandra 0.1 Jul-08 ... 0.3 Jul-09 0.6 May-10 0.7 Feb-11 1.0 Dec-11 DSE 1.2 Oct-12 2.0 Jul-13
  • 3. Application/Use Case • Social Signals: like/want/own features for eBay product and item pages • Hunch taste graph for eBay users and items • Many time series use cases Why Cassandra? • Multi-datacenter • Scalable • Write performance • Distributed counters • Hadoop support ACE
  • 4. Time series data
  • 5. Multi-datacenter support
  • 6. Distributed counters
  • 7. Hadoop support
  • 8. Application/Use Case • Adobe AudienceManager: web analytics, content management, and online advertising Why Cassandra? • Low-latency • Scalable • Multi-datacenter • Tuneable consistency ACE
  • 9. Bootstrapping
  • 10. Bootstrapping
  • 11. s d Bootstrapping s d s d s d
  • 12. s d Bootstrapping s d s d s d
  • 13. Bootstrapping
  • 14. Tuneable consistency •(We’ll come back to this)
  • 15. Application/Use Case • Logging • Notifications Why Cassandra? • Efficient writes • Durable • Scalable • High availability ACE
  • 16. Durable + efficient writes write( k1 ,c1:v1 ) Memory Memtable Commit log Hard drive
  • 17. write(k1 ,c1:v Memory k1 c1:v Memtable k1 c1:v Commit log Hard drive
  • 18. write(k1 ,c2:v k1 c1:v c2:v Memory k1 c1:v k1 c2:v Hard drive
  • 19. write(k2 ,c1:v c2:v ) k1 c1:v c2:v Memory k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v Hard drive
  • 20. write(k1 ,c1:v c3:v ) k1 c1:v c2:v c3:v Memory k2 c1:v c2:v k1 c1:v k1 c2:v k2 c1:v c2:v k1 c1:v c3:v Hard drive
  • 21. Memory flush index / BF cleanup k1 c1:v c2:v c3:v k2 c1:v c2:v SSTable Hard drive
  • 22. High availability •99.9999% availability on Cassandra •(We’ll come back to this, too)
  • 23. Core values •Massive scalability •High performance •Ease of use •Reliability/Availabilty Cassandra MySQL HBase Redis
  • 24. VLDB benchmark (RWS) THROUGHPUT OPS/SEC) 80000 Cassandra MySQL HBase Redis C SS A RA ND A 60000 40000 20000 0 0 2 4 6 NUMBER OF NODES 8 10 12
  • 25. Endpoint benchmark (RW) HBase MongoDB AN DR A Cassandra CA THROUGHPUT OPS/SEC) SS 35000 26250 17500 8750 0 1 2 4 8 NUMBER OF NODES 16 32
  • 26. Ease of use CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
  • 27. Classic partitioning (SPOF) partition 1 partition 2 partition 3 partition 4 router client
  • 28. (Not a theoretical problem) https://speakerdeck.com/mitsuhiko/a-year-of-mongodb http://aphyr.com/posts/288-the-network-is-reliable
  • 29. Fully distributed, no SPOF Client p3 p6 p1 p1 p1
  • 30. Partitioning Primary key determines placement* jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 31. PK Murmur Hash jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Murmur* hash operation yields a 64-bit number for keys of any size.
  • 32. The “token ring” Node A Node B Node D Node C
  • 33. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 34. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 35. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 36. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 37. Start A B C D End 0xc000000000.. 1 0x0000000000.. 1 0x4000000000.. 1 0x8000000000.. 1 0x0000000000.. 0 0x4000000000.. 0 0x8000000000.. 0 0xc000000000.. 0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  • 38. Replication Node A Node D carol a9a0198010... Node B Node C
  • 39. Node A Node D carol a9a0198010... Node B Node C
  • 40. Node A Node D carol a9a0198010... Node B Node C
  • 41. Virtual nodes Node A Node B C’’ B A’’ C D’ Node D Node C Without vnodes B’ C’ A A’ D With vnodes
  • 42. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  • 43. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  • 44. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  • 45. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  • 46. A closer look at reads 90% busy Client Coordinator 30% busy 40% busy
  • 47. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
  • 48. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
  • 49. Rapid read protection 90% busy Client Coordinator 30% busy 40% busy
  • 50. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  • 51. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  • 52. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  • 53. Rapid read protection 90% busy Client X Coordinator 30% busy 40% busy
  • 54. Rapid Read Protection NONE
  • 55. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  • 56. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  • 57. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  • 58. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  • 59. Consistency levels 90% busy Client Coordinator 30% busy 40% busy
  • 60. Consistency levels •ONE •QUORUM •LOCAL_QUORUM •LOCAL_ONE •TWO •ALL
  • 61. Race condition SELECT name FROM users WHERE username = 'pmcfadin'; #CASSANDRAEU
  • 62. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin';
  • 63. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows)
  • 64. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01');
  • 65. Race condition #CASSANDRAEU SELECT name FROM users WHERE username = 'pmcfadin'; (0 rows) SELECT name FROM users WHERE username = 'pmcfadin'; INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00'); (0 rows) This one wins INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01');
  • 66. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; #CASSANDRAEU
  • 67. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True #CASSANDRAEU INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS;
  • 68. Lightweight transactions INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ba27e03fd9...', '2011-06-20 13:50:00') IF NOT EXISTS; [applied] ----------True #CASSANDRAEU INSERT INTO users (username, name, email, password, created_date) VALUES ('pmcfadin', 'Patrick McFadin', ['patrick@datastax.com'], 'ea24e13ad9...', '2011-06-20 13:50:01') IF NOT EXISTS; [applied] | username | created_date | name -----------+----------+----------------+---------------False | pmcfadin | 2011-06-20 ... | Patrick McFadin
  • 69. Paxos •All operations are quorum-based •Each replica sends information about unfinished operations to the leader during prepare •Paxos made Simple
  • 70. Details •4 round trips vs 1 for normal updates •Paxos state is durable •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •http://www.datastax.com/dev/blog/lightweighttransactions-in-cassandra-2-0
  • 71. Use with caution •Great for 1% of your application •Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013- eventual-consistency-hopeful-consistency-by-christos-kalantzis
  • 72. Cassandra 2.1
  • 73. User defined types CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) SELECT id, name, addresses.city, addresses.phones FROM users; id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
  • 74. Collection indexing CREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text> ); CREATE INDEX song_tags_idx ON songs(tags); SELECT * FROM songs WHERE 'blues' IN tags; id | album | artist | tags | title ----------+---------------+-------------------+-----------------------+-----------------5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind
  • 75. More-efficient repair
  • 76. More-efficient repair
  • 77. More-efficient repair
  • 78. More-efficient repair
  • 79. More-efficient repair
  • 80. More-efficient repair
  • 81. More-efficient repair
  • 82. More-efficient repair
  • 83. More-efficient repair
  • 84. 2.1 roadmap •Efficient handling of cold data •Counters 2.0 •Only repair new-since-last-repair data •January/February 2014
  • 85. Вопросы?