Apache Cassandra 1.1Jonathan Ellis / @spyced©2012 DataStax
New features in 1.1 • CQL3 • Global row + key caches • Fine-grained data storage control • Row level isolation • Concurren...
Modern Cassandra, briefly • 0.7          •      CREATE COLUMN FAMILY          •      TTL          •      Secondary (column)...
Global row + key caches  • cassandra.yaml          •      key_cache_size_in_mb (default 2)          •      row_cache_size_...
Data storage  • Old:          •      /var/lib/cassandra/data/Keyspace1/Standard1-hc-1-Data.db     •      New:          •  ...
Row-level isolation  • Never see partial updates to a row  • We now have AID from ACID          •      C in ACID != C in C...
Concurrent schema changes  • Fixes http://wiki.apache.org/cassandra/            FAQ#schema_disagreement     •      Can sti...
Off-heap cache on Windows • SerializingCacheProvider no longer requires JNA • SCP is the default starting with 1.0, but fa...
Write survey mode • bin/cassandra -Dcassandra.write_survey=true • Allows experimenting w/ compaction, compression, new    ...
Abortable compactions  • nodetool stop <type>©2012 DataStax
CQL3  • (CQL2 is still default)  • Composite PK support          •      .. slice syntax removed          •      ORDER BY s...
A simple example  CREATE TABLE tweets (      tweet_id uuid PRIMARY KEY,      author varchar,      body varchar  );©2012 Da...
Tweetstweet_id           author                         body                                To be prepared for war is one ...
With clustering    CREATE TABLE timeline (        user_id varchar,        tweet_id uuid,        author varchar,        bod...
Timeline       user_id   tweet_id        author            body       jadams     1787          jmadison        All men ......
Timeline, physical layout                  (1787, author):    (1787, body):      (1790, author):   (1790, body): To    jad...
WITH COMPACT     CREATE TABLE timeline (         user_id varchar,         tweet_id uuid,         author varchar,         b...
(1787, jmadison):    (1790, gwashington):     jadams           All men ...         To be prepared ...                    (...
Earlier changes  • (1.0.6) Allow CF names to be qualified by keyspace for            INSERT, ALTER, DELETE, TRUNCATE      ...
cqlsh  • SOURCE and CAPTURE commands  • (1.0.8) DESCRIBE COLUMNFAMILIES©2012 DataStax
The future is CQL (based)  • cqlsh  • performance          •      prepared statements          •      netty-based transpor...
Hadoop Integration • 2I support* • Wide row support* • BulkOutputFormat     •      (*Covered in updated WordCount)©2012 Da...
Secondary Index support     IndexExpression expr =           new IndexExpression(                 ByteBufferUtil.bytes("in...
Wide row support     ConfigHelper.setInputColumnFamily(           job.getConfiguration(),           KEYSPACE,           CO...
BulkOutputFormat     job.setOutputFormatClass(           BulkOutputFormat.class); • Compatible w/ CFOF + extra options   •...
Stress tool  • tools/bin/stress*  • Insert, read, seq scan, indexed scan, multiget, counter            add/get     •      ...
Bonus: What’s new in C* 1.1.1  • Incremental repair by token range  • Support for commitlog archiving and PITR  • Identify...
DataStax Community, with  OpsCenter©2012 DataStax
Cassandra 1.1
Upcoming SlideShare
Loading in...5
×

Cassandra 1.1

3,761

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,761
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
73
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cassandra 1.1

  1. 1. Apache Cassandra 1.1Jonathan Ellis / @spyced©2012 DataStax
  2. 2. New features in 1.1 • CQL3 • Global row + key caches • Fine-grained data storage control • Row level isolation • Concurrent schema changes • Off-heap cache works on Windows • "Write survey mode" • Hadoop improvements • Stress tool©2012 DataStax
  3. 3. Modern Cassandra, briefly • 0.7 • CREATE COLUMN FAMILY • TTL • Secondary (column) indexes • 0.8 • Counters • Automatic memtable tuning • 1.0 • Compression • Leveled compaction©2012 DataStax
  4. 4. Global row + key caches • cassandra.yaml • key_cache_size_in_mb (default 2) • row_cache_size_in_mb (default 0) • Also save periods • Per-CF: caching=ALL|KEYS_ONLY*|ROWS_ONLY|NONE • Old CF-level options are ignored • row_cache_size, key_cache_size • save periods©2012 DataStax
  5. 5. Data storage • Old: • /var/lib/cassandra/data/Keyspace1/Standard1-hc-1-Data.db • New: • /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1- Standard1-hc-1-Data.db • (Includes KS in filename for easier bulk loading)©2012 DataStax
  6. 6. Row-level isolation • Never see partial updates to a row • We now have AID from ACID • C in ACID != C in CAP©2012 DataStax
  7. 7. Concurrent schema changes • Fixes http://wiki.apache.org/cassandra/ FAQ#schema_disagreement • Can still have temporary disagreements if you use a new CF before all nodes have it • Also speeds up adding new nodes©2012 DataStax
  8. 8. Off-heap cache on Windows • SerializingCacheProvider no longer requires JNA • SCP is the default starting with 1.0, but falls back to CLHCP if JNA is not present in < 1.1©2012 DataStax
  9. 9. Write survey mode • bin/cassandra -Dcassandra.write_survey=true • Allows experimenting w/ compaction, compression, new versions* • isolate node to test reads©2012 DataStax
  10. 10. Abortable compactions • nodetool stop <type>©2012 DataStax
  11. 11. CQL3 • (CQL2 is still default) • Composite PK support • .. slice syntax removed • ORDER BY syntax conforms to SQL©2012 DataStax
  12. 12. A simple example CREATE TABLE tweets (     tweet_id uuid PRIMARY KEY,     author varchar,     body varchar );©2012 DataStax
  13. 13. Tweetstweet_id author body To be prepared for war is one of the most 1790 gwashington effectual means of preserving peace All men having power ought to be distrusted 1787 jmadison to a certain degree Those gentlemen, who will be elected senators, will fix themselves in the federal 1778 gmason town, and become citizens of that town more than of your state©2012 DataStax
  14. 14. With clustering CREATE TABLE timeline (     user_id varchar,     tweet_id uuid,     author varchar,     body varchar,     PRIMARY KEY (user_id, tweet_id) ); partition key clustered©2012 DataStax
  15. 15. Timeline user_id tweet_id author body jadams 1787 jmadison All men ... jadams 1790 gwashington To be prepared ... ahamilton 1778 gmason Those gentlemen ... ahamilton 1790 gwashington To be prepared ... not clustered (within partition key) clustered©2012 DataStax
  16. 16. Timeline, physical layout (1787, author): (1787, body): (1790, author): (1790, body): To jadams jmadison All men ... gwashington be prepared ... (1778, author): (1778, body): (1790, author): (1790, body): To ahamilton gmason Those gentlemen ... gwashington be prepared ... Non-PK columns containstring literal of column name ©2012 DataStax
  17. 17. WITH COMPACT CREATE TABLE timeline (     user_id varchar,     tweet_id uuid,     author varchar,     body varchar,     PRIMARY KEY (user_id, tweet_id, author) ) WITH COMPACT STORAGE; All but one column • For backwards compatibility©2012 DataStax
  18. 18. (1787, jmadison): (1790, gwashington): jadams All men ... To be prepared ... (1778, gmason): (1790, gwashington): ahamilton Those gentlemen ... To be prepared ...no “body” literal ©2012 DataStax
  19. 19. Earlier changes • (1.0.6) Allow CF names to be qualified by keyspace for INSERT, ALTER, DELETE, TRUNCATE • INSERT INTO ks.cf (...) VALUES (...) • (SELECT was done in 1.0.1) • (1.0.4) ALTER CF attributes©2012 DataStax
  20. 20. cqlsh • SOURCE and CAPTURE commands • (1.0.8) DESCRIBE COLUMNFAMILIES©2012 DataStax
  21. 21. The future is CQL (based) • cqlsh • performance • prepared statements • netty-based transport (CASSANDRA-2478) • What does this mean for pycassa, Hector, et al?©2012 DataStax
  22. 22. Hadoop Integration • 2I support* • Wide row support* • BulkOutputFormat • (*Covered in updated WordCount)©2012 DataStax
  23. 23. Secondary Index support IndexExpression expr = new IndexExpression( ByteBufferUtil.bytes("int4"), IndexOperator.EQ, ByteBufferUtil.bytes(0)); ConfigHelper.setInputRange( job.getConfiguration(),©2012 DataStax
  24. 24. Wide row support ConfigHelper.setInputColumnFamily( job.getConfiguration(), KEYSPACE, COLUMN_FAMILY, true); Also: PIG_WIDEROW_INPUT©2012 DataStax
  25. 25. BulkOutputFormat job.setOutputFormatClass( BulkOutputFormat.class); • Compatible w/ CFOF + extra options • OUTPUT_LOCATION • BUFFER_SIZE_IN_MB • STREAM_THROTTLE_MBITS • (system default, 64, unlimited) • Limitation: can’t stream to dead nodes (fix in 1.1.1?)©2012 DataStax
  26. 26. Stress tool • tools/bin/stress* • Insert, read, seq scan, indexed scan, multiget, counter add/get • CQL©2012 DataStax
  27. 27. Bonus: What’s new in C* 1.1.1 • Incremental repair by token range • Support for commitlog archiving and PITR • Identify and blacklist corrupted SSTables from future compactions • Open 1 sstableScanner per level for leveled compaction • More CQL3 improvements (e.g. reversed clustering) • fix re-creating Keyspaces/ColumnFamilies with the same name as dropped ones©2012 DataStax
  28. 28. DataStax Community, with OpsCenter©2012 DataStax
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×