NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"
Upcoming SlideShare
Loading in...5
×
 

NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0"

on

  • 5,183 views

Jonathan Ellis, Apache Cassandra Project Chair & DataStax Co-Founder, presents Apache Cassandra 1.2 + 2.0.

Jonathan Ellis, Apache Cassandra Project Chair & DataStax Co-Founder, presents Apache Cassandra 1.2 + 2.0.

Statistics

Views

Total Views
5,183
Views on SlideShare
4,760
Embed Views
423

Actions

Likes
5
Downloads
68
Comments
0

8 Embeds 423

http://d.hatena.ne.jp 378
http://t.co 21
https://twitter.com 8
http://feedly.com 8
https://abs.twimg.com 4
http://newsblur.com 2
http://localhost 1
http://www.google.co.jp 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0" NYC* Jonathan Ellis Keynote: "Cassandra 1.2 + 2.0" Presentation Transcript

  • Cassandra 1.2 (and 2.0)Jonathan EllisProject Chair, Apache CassandraCTO, DataStax@spyced
  • ©2012 DataStax
  • • Massively scalable • High performance • Reliable/Available©2012 DataStax
  • VLDB benchmark (RWS)©2012 DataStax
  • Endpoint benchmark (RW)©2012 DataStax
  • ©2012 DataStax
  • ©2012 DataStax
  • 1.2 • Concurrent schema • Atomic batches changes • CQL3 • Virtual nodes • Collections • “Fat node” support • Data dictionary • JBOD improvements • Tracing • Off-heap bloom filters, compression metadata • Parallel leveled compaction©2012 DataStax
  • Concurrent Schema Changes CREATE TABLE X; ... DROP TABLE X; Client Cassandra Cluster Client CREATE TABLE Y; ...©2012 DataStax DROP TABLE Y;
  • Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  • Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  • Virtual nodes A C D B E A F F B P G Ring without Ring with vnodes vnodes O H E C N I M J D L K©2012 DataStax
  • Node Rebuild without vnodes Node 1 Node 2 Node 3 A B C F E A F B A A F B Ring without vnodes E C D D E F C B D C E D Node 4 Node 5 Node 6©2012 DataStax
  • Node Rebuild with vnodes Node 1 Node 2 Node 3 B E A P K G G K M O C N C D D J D H J F B E A F L A K F P I P Ring with G O VNodes H N I M O E P H C M J L K I H I A B O B L M C N E F D G N J L Node 4 Node 5 Node 6©2012 DataStax
  • JBOD support Cassandra Instance HDD1 HDD2 HDD3 HDD4©2012 DataStax
  • JBOD support Cassandra Instance HDD1 X HDD2 HDD3 HDD4©2012 DataStax
  • On-Heap/Off-Heap On-Heap Off-Heap Managed by GC Not managed by GC JVM Java Heap Native Memory Java Process©2012 DataStax
  • Moving O(n) structures off-heap • Row (partition) bloom filter • 1-2GB per billion rows • Compression metadata • ~20GB per TB compressed data • 1.2 targets 5-10TB of data per machine©2012 DataStax
  • Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  • Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  • Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  • Batches Partition Replica Coordinator Partition Client Node Replica Partition Replica©2012 DataStax
  • Batches Partition Replica Client X Coordinator Node Partition Replica Partition Replica©2012 DataStax
  • Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  • Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  • Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  • Atomic batches Partition Replica Coordinator Partition Client Node Replica Partition Batchlog Replica Node©2012 DataStax
  • Atomic batches Partition Replica Client X Coordinator Node Partition Replica Partition Batchlog Replica Node©2012 DataStax
  • Atomic batches Partition Replica Client X Coordinator Node Partition Replica Partition Batchlog Replica Node©2012 DataStax
  • CQL: You got SQL in my NoSQL! CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;©2012 DataStax
  • Strictly “realtime” focused • No joins • No subqueries • No aggregation functions* or GROUP BY • Strictly limited ORDER BY©2012 DataStax
  • songscreate column family songswith key_validation_class = UUIDTypeand comparator = UTF8Type -- cell names are stringsand column_metdata = [{column_name: title, validation_class: UTF8Type} {column_name: album, validation_class: UTF8Type} {column_name: artist, validation_class: UTF8Type {column_name: data, validation_class: BytesType} a3e64f8f... title: La Grange artist: ZZ Top album: Tres Hombres 8a172618... title: Moving in Stereo artist: Fu Manchu album: We Must Obey 2b09185b... title: Outside Woman Blues artist: Back Door Slam album: Roll Away ©2012 DataStax
  • CREATE TABLE songs ( id uuid PRIMARY KEY, title text, artist text, album text, data blob); id title artist album a3e64f8f... La Grange ZZ Top Tres Hombres 8a172618... Moving in Stereo Fu Manchu We Must Obey 2b09185b... Outside Woman Blues Back Door Slam Roll Away©2012 DataStax
  • song_tagscreate column family song_tagswith key_validation_class = UUIDTypeand comparator = UTF8Type; a3e64f8f... blues: 1973: 8a172618... covers: 2003:©2012 DataStax
  • CREATE TABLE song_tags ( id uuid, tag_name text, PRIMARY KEY (id, tag_name) ); a3e64f8f... blues: 1973: 8a172618... covers: 2003: id tag_name a3e64f8f... blues a3e64f8f... 1973 8a172618... covers 8a172618... 2003©2012 DataStax
  • playlistscreate column family playlistswith key_validation_class = UUIDTypeand comparator = CompositeType(UTF8Type, UTF8Type, UTF8Type)and default_validation_class = UUIDType;62c36092... La Grange, Moving in S..., Outside Wo..., ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b... Tres Hombres We Must O... Roll Away©2012 DataStax
  • playlistscreate column family playlistswith key_validation_class = UUIDTypeand comparator = CompositeType(UTF8Type, UTF8Type, UTF8Type)and default_validation_class = UUIDType;62c36092... La Grange, Moving in S..., Outside Wo..., ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b... Tres Hombres We Must O... Roll Away©2012 DataStax
  • CREATE TABLE playlists ( id uuid, title text, album text, artist text, song_id uuid, PRIMARY KEY (id, title, album, artist));62c36092... La Grange, Moving in S..., Outside Wo..., ZZ Top, : a3e64f8f... Fu Manchu, : 8a172618... Back Door ..., : 2b09185b... Tres Hombres We Must O... Roll Away id title artist album song_id 62c36092... La Grange ZZ Top Tres Hombres a3e64f8f... 62c36092... Moving in Stereo Fu Manchu We Must Obey 8a172618... 62c36092...©2012 DataStax Outside Wo... Back Door Slam Roll Away 2b09185b...
  • CollectionsCREATE TABLE songs ( id uuid PRIMARY KEY, title text, artist text, album text, tags set<text>, data blob); id title artist album tags a3e64f8f... La Grange ZZ Top Tres Hombres {blues, 1973} 8a172618... Moving in Stereo Fu Manchu We Must Obey {covers, 2003} 2b09185b... Outside Woman Blues Back Door Slam Roll Away©2012 DataStax
  • Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"} ©2012 DataStax
  • Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"} ©2012 DataStax
  • Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"}cqlsh:system> SELECT * FROM schema_columnfamilies WHERE keyspace_name=keyspace1 ANDcolumnfamily_name=test; ©2012 DataStax
  • Data dictionarycqlsh:system> SELECT * FROM schema_keyspaces; keyspace_name | durable_writes | strategy_class | strategy_options---------------+----------------+----------------+---------------------------- keyspace1 | True | SimpleStrategy | {"replication_factor":"1"} system | True | LocalStrategy | {} system_traces | True | SimpleStrategy | {"replication_factor":"1"}cqlsh:system> SELECT * FROM schema_columnfamilies WHERE keyspace_name=keyspace1 ANDcolumnfamily_name=test;cqlsh:system> SELECT * FROM schema_columns WHERE keyspace_name=keyspace1 ANDcolumnfamily_name=test; ©2012 DataStax
  • Data dictionarycqlsh:system> SELECT * FROM local; key | bootstrapped | cluster_name | cql_version | data_center | gossip_generation |partitioner | rack | release_version | ring_id| thrift_version | tokens | truncated_at-------+--------------+--------------+-------------+-------------+-------------------+---------------------------------------------+-------+----------------------+--------------------------------------+----------------+--------+-------------- local | COMPLETED | test | 3.0.0 | datacenter1 | 1352846064 |org.apache.cassandra.dht.Murmur3Partitioner | rack1 | 1.2.0-beta2-SNAPSHOT |224c55d5-21b4-42b0-8969-afc0cc04e812 | 19.35.0 | {0} | null ©2012 DataStax
  • Data dictionarycqlsh:system> SELECT * FROM peers LIMIT 1; peer | data_center | rack | release_version | ring_id| rpc_address | schema_version | tokens-----------+-------------+-------+----------------------+--------------------------------------+-------------+--------------------------------------+----------------------- 127.0.0.3 | datacenter1 | rack1 | 1.2.0-beta2-SNAPSHOT | f6782327-ef8e-41cf-87b9-2edc287b1ffe | 127.0.0.3 | 915ed888-ddd0-3448-860c-582f4eea1bc6 |{6148914691236517204} ©2012 DataStax
  • Request tracing cqlsh:foo> INSERT INTO bar (i, j) VALUES (6, 2); Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed -------------------------------------+--------------+-----------+---------------- Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779 Message received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888 Message received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334 Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550©2012 DataStax
  • Tracing an antipattern CREATE TABLE queues ( id text, created_at timeuuid, value blob, PRIMARY KEY (id, created_at) ); id created_at value myqueue 3092e86f 9b0450d30de9 myqueue 0867f47c fc7aee5f6a66 myqueue 5fc74be0 668fdb3a2196©2012 DataStax
  • Tracing an antipattern CREATE TABLE queues ( id text, created_at timeuuid, value blob, PRIMARY KEY (id, created_at) ); id created_at value myqueue 3092e86f 9b0450d30de9 myqueue 0867f47c fc7aee5f6a66 myqueue 5fc74be0 668fdb3a2196©2012 DataStax
  • CREATE TABLE queues ( id text, created_at timeuuid, value blob, PRIMARY KEY (id, created_at) ); id created_at value myqueue 3092e86f 9b0450d30de9 myqueue 0867f47c fc7aee5f6a66 myqueue 5fc74be0 668fdb3a2196©2012 DataStax
  • cqlsh:foo> SELECT FROM queues WHERE id = myqueue ORDER BY created_at LIMIT 1; Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9 activity | timestamp | source | source_elapsed ------------------------------------------+--------------+-----------+--------------- execute_cql3_query | 19:31:05,650 | 127.0.0.1 | 0 Sending message to /127.0.0.3 | 19:31:05,651 | 127.0.0.1 | 541 Message received from /127.0.0.1 | 19:31:05,651 | 127.0.0.3 | 39 Executing single-partition query | 19:31:05,652 | 127.0.0.3 | 943 Acquiring sstable references | 19:31:05,652 | 127.0.0.3 | 973 Merging memtable contents | 19:31:05,652 | 127.0.0.3 | 1020 Merging data from memtables and sstables | 19:31:05,652 | 127.0.0.3 | 1081 Read 1 live cells and 100000 tombstoned | 19:31:05,686 | 127.0.0.3 | 35072 Enqueuing response to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35220 Sending message to /127.0.0.1 | 19:31:05,687 | 127.0.0.3 | 35314 Message received from /127.0.0.3 | 19:31:05,687 | 127.0.0.1 | 36908 Processing response from /127.0.0.3 | 19:31:05,688 | 127.0.0.1 | 37650 Request complete | 19:31:05,688 | 127.0.0.1 | 38047©2012 DataStax
  • 2.0 • Eager retries • Improved compaction • Triggers • CAS (Compare-and-set) • More-efficient repair©2012 DataStax
  • Eager retries 90% busy Client Coordinator 30% busy 40% busy©2012 DataStax
  • Eager retries 90% busy Client Coordinator 30% busy 40% busy©2012 DataStax
  • Eager retries 90% busy Client Coordinator 30% busy 40% busy©2012 DataStax
  • Improved compaction • Specialized strategy for append-only with TTL • Can we do any better for a general-purpose workload?©2012 DataStax
  • ©2012 DataStax
  • Triggers CREATE TRIGGER foo BEFORE UPDATE ON users EXECUTE ’/var/lib/cassandra/triggers/send_registration_email.jar’©2012 DataStax
  • Triggers class MyTrigger implements ITrigger { public Collection<RowMutation> revise(ByteBuffer key, ColumnFamily update) { ... } }©2012 DataStax
  • CAS Session 1 Session 2 SELECT * FROM users SELECT * FROM users WHERE username = ’jbellis’ WHERE username = ’jbellis’ [empty resultset] [empty resultset] INSERT INTO users (...) INSERT INTO users (...) VALUES (’jbellis’, ...) VALUES (’jbellis’, ...)©2012 DataStax
  • CAS • Locking does not solve this problem • 2PC does not solve this problem • Locking + 2PC does not solve this problem©2012 DataStax
  • Paxos!©2012 DataStax
  • Open questions • What do we call it? • Conditional write guarantee? • Atomic conditional updates? • Lightweight transactions? • What syntax do we use for CQL? UPDATE USERS SET email = ‘jonathan@datastax.com’, ... WHERE username = ’jbellis’ IF email = ‘jbellis@datastax.com’©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • More-efficient repair©2012 DataStax
  • Consequences • Repair won’t replace missing data due to hardware failure by default • Add --include-previously-repaired to force old- style full validation©2012 DataStax