Your SlideShare is downloading. ×
0
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"

5,336

Published on

Jonathan Ellis, Apache Cassandra Project Chair & DataStax Co-Founder, presents Apache Cassandra 1.2 + 2.0.

Jonathan Ellis, Apache Cassandra Project Chair & DataStax Co-Founder, presents Apache Cassandra 1.2 + 2.0.

Published in: Technology
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,336
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
74
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. Cassandra 1.2 (and 2.0)Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
  2. ©2012 DataStax
  3. • Massively scalable • High performance • Reliable/Available©2012 DataStax
  4. VLDB benchmark (RWS)©2012 DataStax
  5. Endpoint benchmark (RW)©2012 DataStax
  6. ©2012 DataStax
  7. ©2012 DataStax
  8. 1.2 • Concurrent schema • Atomic batches changes • CQL3 • Virtual nodes • Collections • “Fat node” support • Data dictionary • JBOD improvements • Tracing • Off-heap bloom filters, compression metadata • Parallel leveled compaction©2012 DataStax
  9. Concurrent Schema Changes CREATE TABLE X; ... DROP TABLE X; Client Cassandra Cluster Client CREATE TABLE Y; ...©2012 DataStax DROP TABLE Y;
  10. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  11. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  12. Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  13. Node Rebuild without vnodes Node 1 Node 2 Node 3 A B C F E A F B A A F B Ring without vnodes E C D D E F C B D C E D Node 4 Node 5 Node 6©2012 DataStax
  14. Node Rebuild with vnodes Node 1 Node 2 Node 3 B E A P K G G K M O C N C D D J D H J F B E A F L A K F P I P Ring with G O VNodes H N I M O E P H C M J L K I H I A B O B L M C N E F D G N J L Node 4 Node 5 Node 6©2012 DataStax
  15. JBOD support Cassandra Instance HDD1 HDD2 HDD3 HDD4©2012 DataStax
  16. JBOD support Cassandra Instance HDD1 X HDD2 HDD3 HDD4©2012 DataStax
  17. On-Heap/Off-Heap On-Heap Off-Heap Managed by GC Not managed by GC JVM Java Heap Native Memory Java Process©2012 DataStax
  18. Moving O(n) structures off-heap • Row (partition) bloom filter • 1-2GB per billion rows • Compression metadata • ~20GB per TB compressed data • 1.2 targets 5-10TB of data per machine©2012 DataStax
  19. Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  20. Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  21. Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  22. Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  23. Batches Partition Replica Client X Coordinator Node Partition Replica Partition Replica©2012 DataStax
  24. Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  25. Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  26. Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  27. Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  28. Atomic batches Partition Replica Client X Coordinator Node Partition Replica Partition Batchlog Replica Node©2012 DataStax
  29. Atomic batches Partition Replica Client X Coordinator Node Partition Replica Partition Batchlog Replica Node©2012 DataStax
  30. CQL: You got SQL in my NoSQL! CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;©2012 DataStax
  31. Strictly “realtime” focused • No joins • No subqueries • No aggregation functions* or GROUP BY • Strictly limited ORDER BY©2012 DataStax
  32. songscreate column family songswith key_validation_class = UUIDTypeand comparator = UTF8Type -- cell names are stringsand column_metdata = [{column_name: title, validation_class: UTF8Type} {column_name: album, validation_class: UTF8Type} {column_name: artist, validation_class: UTF8Type {column_name: data, validation_class: BytesType} a3e64f8f... title: La Grange artist: ZZ Top album: Tres Hombres 8a172618... title: Moving in Stereo artist: Fu Manchu album: We Must Obey 2b09185b... title: Outside Woman Blues artist: Back Door Slam album: Roll Away ©2012 DataStax
  33. CREATE TABLE songs ( id uuid PRIMARY KEY, title text, artist text, album text, data blob); id title artist album a3e64f8f... La Grange ZZ Top Tres Hombres 8a172618... Moving in Stereo Fu Manchu We Must Obey 2b09185b... Outside Woman Blues Back Door Slam Roll Away©2012 DataStax
  34. song_tagscreate column family song_tagswith key_validation_class = UUIDTypeand comparator = UTF8Type; a3e64f8f... blues: 1973: 8a172618... covers: 2003:©2012 DataStax
  35. CREATE TABLE song_tags ( id uuid, tag_name text, PRIMARY KEY (id, tag_name) ); a3e64f8f... blues: 1973: 8a172618... covers: 2003: id tag_name a3e64f8f... blues a3e64f8f... 1973 8a172618... covers 8a172618... 2003©2012 DataStax
  36. playlistscreate column family playlistswith key_validation_class = UUIDTypeand comparator = CompositeType(UTF8Type, UTF8Type, UTF8Type)and default_validation_class = UUIDType;62c36092... La Grange, Moving in S..., Outside Wo..., ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b... Tres Hombres We Must O... Roll Away©2012 DataStax
  37. playlistscreate column family playlistswith key_validation_class = UUIDTypeand comparator = CompositeType(UTF8Type, UTF8Type, UTF8Type)and default_validation_class = UUIDType;62c36092... La Grange, Moving in S..., Outside Wo..., ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b... Tres Hombres We Must O... Roll Away©2012 DataStax
  38. CREATE TABLE playlists ( id uuid, title text, album text, artist text, song_id uuid, PRIMARY KEY (id, title, album, artist));62c36092... La Grange, Moving in S..., Outside Wo..., ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b... Tres Hombres We Must O... Roll Away id title artist album song_id 62c36092... La Grange ZZ Top Tres Hombres a3e64f8f... 62c36092... Moving in Stereo Fu Manchu We Must Obey 8a172618... 62c36092...©2012 DataStax Outside Wo... Back Door Slam Roll Away 2b09185b...
  39. CollectionsCREATE TABLE songs ( id uuid PRIMARY KEY, title text, artist text, album text, tags set<text>, data blob); id title artist album tags a3e64f8f... La Grange ZZ Top Tres Hombres {blues, 1973} 8a172618... Moving in Stereo Fu Manchu We Must Obey {covers, 2003} 2b09185b... Outside Woman Blues Back Door Slam Roll Away©2012 DataStax
  40. Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"} ©2012 DataStax
  41. Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"} ©2012 DataStax
  42. Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"}cqlsh:system> SELECT * FROM schema_columnfamilies WHERE keyspace_name=keyspace1 ANDcolumnfamily_name=test; ©2012 DataStax
  43. Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"}cqlsh:system> SELECT * FROM schema_columnfamilies WHERE keyspace_name=keyspace1 ANDcolumnfamily_name=test;cqlsh:system> SELECT * FROM schema_columns WHERE keyspace_name=keyspace1 ANDcolumnfamily_name=test; ©2012 DataStax
  44. Data dictionarycqlsh:system> SELECT * FROM local; key | bootstrapped | cluster_name | cql_version | data_center | gossip_generation |partitioner | rack | release_version | ring_id| thrift_version | tokens | truncated_at-------+--------------+--------------+-------------+-------------+-------------------+---------------------------------------------+-------+----------------------+--------------------------------------+----------------+--------+-------------- local | COMPLETED | test | 3.0.0 | datacenter1 | 1352846064 |org.apache.cassandra.dht.Murmur3Partitioner | rack1 | 1.2.0-beta2-SNAPSHOT |224c55d5-21b4-42b0-8969-afc0cc04e812 | 19.35.0 | {0} | null ©2012 DataStax
  45. Data dictionarycqlsh:system> SELECT * FROM peers LIMIT 1; peer | data_center | rack | release_version | ring_id| rpc_address | schema_version | tokens-----------+-------------+-------+----------------------+--------------------------------------+-------------+--------------------------------------+----------------------- 127.0.0.3 | datacenter1 | rack1 | 1.2.0-beta2-SNAPSHOT | f6782327-ef8e-41cf-87b9-2edc287b1ffe | 127.0.0.3 | 915ed888-ddd0-3448-860c-582f4eea1bc6 |{6148914691236517204} ©2012 DataStax
  46. Request tracing cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------- Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550©2012 DataStax
  47. Tracing an antipattern CREATE TABLE queues ( id text, created_at timeuuid, value blob, PRIMARY KEY (id, created_at) ); id created_at value myqueue 3092e86f 9b0450d30de9 myqueue 0867f47c fc7aee5f6a66 myqueue 5fc74be0 668fdb3a2196©2012 DataStax
  48. Tracing an antipattern CREATE TABLE queues ( id text, created_at timeuuid, value blob, PRIMARY KEY (id, created_at) ); id created_at value myqueue 3092e86f 9b0450d30de9 myqueue 0867f47c fc7aee5f6a66 myqueue 5fc74be0 668fdb3a2196©2012 DataStax
  49. CREATE TABLE queues ( id text, created_at timeuuid, value blob, PRIMARY KEY (id, created_at) ); id created_at value myqueue 3092e86f 9b0450d30de9 myqueue 0867f47c fc7aee5f6a66 myqueue 5fc74be0 668fdb3a2196©2012 DataStax
  50. cqlsh:foo> SELECT FROM queues WHERE id = myqueue ORDER BY created_at LIMIT 1; Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed ------------------------------------------+--------------+-----------+--------------- execute_cql3_query | 19:31:05,650 | 127.0.0.1 | 0 Sending message to /127.0.0.3 | 19:31:05,651 | 127.0.0.1 | 541 Message received from /127.0.0.1 | 19:31:05,651 | 127.0.0.3 | 39 Executing single-partition query | 19:31:05,652 | 127.0.0.3 | 943 Acquiring sstable references | 19:31:05,652 | 127.0.0.3 | 973 Merging memtable contents | 19:31:05,652 | 127.0.0.3 | 1020 Merging data from memtables and sstables | 19:31:05,652 | 127.0.0.3 | 1081 Read 1 live cells and 100000 tombstoned | 19:31:05,686 | 127.0.0.3 | 35072 Enqueuing response to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35220 Sending message to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35314 Message received from /127.0.0.3 | 19:31:05,687 | 127.0.0.1 | 36908 Processing response from /127.0.0.3 | 19:31:05,688 | 127.0.0.1 | 37650 Request complete | 19:31:05,688 | 127.0.0.1 | 38047©2012 DataStax
  51. 2.0 • Eager retries • Improved compaction • Triggers • CAS (Compare-and-set) • More-efficient repair©2012 DataStax
  52. Eager retries 90% busy Client Coordinator 30% busy 40% busy©2012 DataStax
  53. Eager retries 90% busy Client Coordinator 30% busy 40% busy©2012 DataStax
  54. Eager retries 90% busy Client Coordinator 30% busy 40% busy©2012 DataStax
  55. Improved compaction • Specialized strategy for append-only with TTL • Can we do any better for a general-purpose workload?©2012 DataStax
  56. ©2012 DataStax
  57. Triggers CREATE TRIGGER foo BEFORE UPDATE ON users EXECUTE ’/var/lib/cassandra/triggers/send_registration_email.jar’©2012 DataStax
  58. Triggers class MyTrigger implements ITrigger { public Collection<RowMutation> revise(ByteBuffer key, ColumnFamily update) { ... } }©2012 DataStax
  59. CAS Session 1 Session 2 SELECT * FROM users SELECT * FROM users WHERE username = ’jbellis’ WHERE username = ’jbellis’ [empty resultset] [empty resultset] INSERT INTO users (...) INSERT INTO users (...) VALUES (’jbellis’, ...) VALUES (’jbellis’, ...)©2012 DataStax
  60. CAS • Locking does not solve this problem • 2PC does not solve this problem • Locking + 2PC does not solve this problem©2012 DataStax
  61. Paxos!©2012 DataStax
  62. Open questions • What do we call it? • Conditional write guarantee? • Atomic conditional updates? • Lightweight transactions? • What syntax do we use for CQL? UPDATE USERS SET email = ‘jonathan@datastax.com’, ... WHERE username = ’jbellis’ IF email = ‘jbellis@datastax.com’©2012 DataStax
  63. More-efficient repair©2012 DataStax
  64. More-efficient repair©2012 DataStax
  65. More-efficient repair©2012 DataStax
  66. More-efficient repair©2012 DataStax
  67. More-efficient repair©2012 DataStax
  68. More-efficient repair©2012 DataStax
  69. More-efficient repair©2012 DataStax
  70. More-efficient repair©2012 DataStax
  71. More-efficient repair©2012 DataStax
  72. More-efficient repair©2012 DataStax
  73. More-efficient repair©2012 DataStax
  74. Consequences • Repair won’t replace missing data due to hardware failure by default • Add --include-previously-repaired to force old- style full validation©2012 DataStax

×