Cassandra Summit EU 2013

2,298 views

Published on

Published in: Technology
  • Be the first to comment

Cassandra Summit EU 2013

  1. 1. #CASSANDRAEU Cassandra 2.0 and 2.1 Jonathan Ellis CTO, DataStax
  2. 2. Five years of Cassandra 0.1 Jul-08 ... 0.3 Jul-09 0.6 May-10 0.7 Feb-11 #CASSANDRAEU 1.0 Dec-11 DSE 1.2 Oct-12 2.0 Jul-13
  3. 3. Core values •Massive scalability •High performance •Reliability/Availabilty #CASSANDRAEU Cassandra MySQL HBase Redis
  4. 4. VLDB benchmark (RWS) THROUGHPUT OPS/SEC) 80000 Cassandra MySQL HBase #CASSANDRAEU Redis C SS A RA ND A 60000 40000 20000 0 0 2 4 6 NUMBER OF NODES 8 10 12
  5. 5. Endpoint benchmark (RW) HBase MongoDB AN DR A Cassandra #CASSANDRAEU CA THROUGHPUT OPS/SEC) SS 35000 26250 17500 8750 0 1 2 4 8 NUMBER OF NODES 16 32
  6. 6. #CASSANDRAEU
  7. 7. New core value •Massive scalability •High performance •Reliability/Availabilty •Ease of use #CASSANDRAEU CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;
  8. 8. Native Drivers #CASSANDRAEU •CQL native protocol: efficient, lightweight, asynchronous •Java (GA): https://github.com/datastax/java-driver •.NET (GA): https://github.com/datastax/csharp-driver •Python (Beta): https://github.com/datastax/pythondriver •Coming soon: PHP, Ruby
  9. 9. Tracing #CASSANDRAEU cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550
  10. 10. Authentication #CASSANDRAEU [cassandra.yaml] authenticator: PasswordAuthenticator # DSE offers KerberosAuthenticator
  11. 11. Authentication #CASSANDRAEU [cassandra.yaml] authenticator: PasswordAuthenticator # DSE offers KerberosAuthenticator CREATE USER robin WITH PASSWORD 'manager' SUPERUSER; ALTER USER cassandra WITH PASSWORD 'newpassword'; LIST USERS; DROP USER cassandra;
  12. 12. Authorization #CASSANDRAEU [cassandra.yaml] authorizer: CassandraAuthorizer GRANT select ON audit TO jonathan; GRANT modify ON users TO robin; GRANT all ON ALL KEYSPACES TO lara;
  13. 13. Lightweight transactions Session 1 #CASSANDRAEU Session 2 SELECT * FROM users WHERE username = ’jbellis’ SELECT * FROM users WHERE username = ’jbellis’ [empty resultset] [empty resultset] INSERT INTO users (...) VALUES (’jbellis’, ...) INSERT INTO users (...) VALUES (’jbellis’, ...)
  14. 14. Paxos #CASSANDRAEU •All operations are quorum-based •Each replica sends information about unfinished operations to the leader during prepare •Paxos made Simple
  15. 15. Details #CASSANDRAEU •4 round trips vs 1 for normal updates •Paxos state is durable •Immediate consistency with no leader election or failover •ConsistencyLevel.SERIAL •http://www.datastax.com/dev/blog/lightweighttransactions-in-cassandra-2-0
  16. 16. Use with caution #CASSANDRAEU •Great for 1% of your application •Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013- eventual-consistency-hopeful-consistency-by-christos-kalantzis
  17. 17. Syntax #CASSANDRAEU INSERT INTO USERS (username, email, ...) VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... ) IF NOT EXISTS; UPDATE USERS SET email = ’jonathan@datastax.com’, ... WHERE username = ’jbellis’ IF email = ’jbellis@datastax.com’;
  18. 18. Triggers CREATE TRIGGER <name> ON <table> USING <classname>; #CASSANDRAEU
  19. 19. Trigger implementation #CASSANDRAEU class MyTrigger implements ITrigger { public Collection<RowMutation> augment (ByteBuffer key, ColumnFamily update) { ... } }
  20. 20. Experimental! #CASSANDRAEU •Relies on internal RowMutation, ColumnFamily classes •[partition] key is a ByteBuffer •Expect changes in 2.1
  21. 21. Cursors (before) CREATE TABLE timeline (   user_id uuid,   tweet_id timeuuid,   tweet_author uuid, tweet_body text,   PRIMARY KEY (user_id, tweet_id) ); SELECT * FROM timeline WHERE (user_id = :last_key AND tweet_id > :last_tweet) OR token(user_id) > token(:last_key) LIMIT 100 #CASSANDRAEU
  22. 22. Cursors (after) SELECT * FROM timeline #CASSANDRAEU
  23. 23. Other CQL improvements #CASSANDRAEU
  24. 24. Other CQL improvements •SELECT DISTINCT pk #CASSANDRAEU
  25. 25. Other CQL improvements •SELECT DISTINCT pk •CREATE TABLE IF NOT EXISTS table #CASSANDRAEU
  26. 26. Other CQL improvements •SELECT DISTINCT pk •CREATE TABLE IF NOT EXISTS table •SELECT ... AS • SELECT event_id, dateOf(created_at) AS creation_date #CASSANDRAEU
  27. 27. Other CQL improvements •SELECT DISTINCT pk •CREATE TABLE IF NOT EXISTS table •SELECT ... AS • SELECT event_id, dateOf(created_at) AS creation_date •ALTER TABLE DROP column • #CASSANDRAEU
  28. 28. On-Heap/Off-Heap On-Heap Managed by GC Java Process #CASSANDRAEU Off-Heap Not managed by GC
  29. 29. Read path (per sstable) Bloom filter Memory Disk #CASSANDRAEU
  30. 30. Read path (per sstable) #CASSANDRAEU Bloom filter Memory Disk Partition key cache
  31. 31. Read path (per sstable) #CASSANDRAEU Bloom filter Partition summary Memory Disk 0X... 0X... 0X... Partition key cache
  32. 32. Read path (per sstable) #CASSANDRAEU Bloom filter Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Partition index Partition key cache
  33. 33. Read path (per sstable) #CASSANDRAEU Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Partition index Partition key cache
  34. 34. Read path (per sstable) #CASSANDRAEU Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Data Partition index Partition key cache
  35. 35. Off heap in 2.0 #CASSANDRAEU Partition key bloom filter 1-2GB per billion partitions Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Data Partition index Partition key cache
  36. 36. Off heap in 2.0 #CASSANDRAEU Compression metadata ~1-3GB per TB compressed Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Data Partition index Partition key cache
  37. 37. Off heap in 2.0 #CASSANDRAEU Partition index summary (depends on rows per partition) Bloom filter Compression offsets Partition summary 0X... 0X... 0X... Memory Disk 0X... 0X... 0X... 0X... Data Partition index Partition key cache
  38. 38. Compaction •Single-pass, always •LCS performs STCS in L0 #CASSANDRAEU
  39. 39. Healthy leveled compaction #CASSANDRAEU L0 L1 L2 L3 L4 L5
  40. 40. Sad leveled compaction #CASSANDRAEU L0 L1 L2 L3 L4 L5
  41. 41. STCS in L0 #CASSANDRAEU L0 L1 L2 L3 L4 L5
  42. 42. Rapid Read Protection NONE #CASSANDRAEU
  43. 43. Cassandra 2.1
  44. 44. User defined types #CASSANDRAEU CREATE TYPE address ( street text, city text, zip_code int, phones set<text> ) CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address> ) SELECT id, name, addresses.city, addresses.phones FROM users; id | name | addresses.city | addresses.phones --------------------+----------------+-------------------------63bf691f | jbellis | Austin | {'512-4567', '512-9999'}
  45. 45. Collection indexing #CASSANDRAEU CREATE TABLE songs ( id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text> ); CREATE INDEX song_tags_idx ON songs(tags); SELECT * FROM songs WHERE 'blues' IN tags; id | album | artist | tags | title ----------+---------------+-------------------+-----------------------+-----------------5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind
  46. 46. Inefficient bloom filters + =? #CASSANDRAEU
  47. 47. Inefficient bloom filters + = #CASSANDRAEU
  48. 48. Inefficient bloom filters + = #CASSANDRAEU
  49. 49. Inefficient bloom filters #CASSANDRAEU
  50. 50. HyperLogLog applied #CASSANDRAEU
  51. 51. HLL and compaction #CASSANDRAEU
  52. 52. HLL and compaction #CASSANDRAEU
  53. 53. HLL and compaction #CASSANDRAEU
  54. 54. More-efficient repair #CASSANDRAEU
  55. 55. More-efficient repair #CASSANDRAEU
  56. 56. More-efficient repair #CASSANDRAEU
  57. 57. More-efficient repair #CASSANDRAEU
  58. 58. More-efficient repair #CASSANDRAEU
  59. 59. More-efficient repair #CASSANDRAEU
  60. 60. More-efficient repair #CASSANDRAEU
  61. 61. More-efficient repair #CASSANDRAEU
  62. 62. More-efficient repair #CASSANDRAEU
  63. 63. 2.1 roadmap •January 2014 #CASSANDRAEU
  64. 64. #CASSANDRAEU Questions?

×