Cassandra 0.7Friday, December 10, 2010
Features                    • Live schema modification                    • Secondary indexes                    • Hadoop O...
Operations                    • efficient Streaming                    • Per-ColumnFamily settings of memtable             ...
Operations backports                    •       HH disable (0.6.2)                    •       compaction priority (0.6.3) ...
Compatiblity                    • Fully backwards-compatible with 0.6 data                    • Some Thrift API changes   ...
FeaturesFriday, December 10, 2010
Live schema changes                    • Details: http://www.riptano.com/blog/live-                            schema-upda...
Data model tradeoffs                    • Twitter: “Fifteen months ago, it took two                            weeks to pe...
A static ColumnFamilyFriday, December 10, 2010
Friday, December 10, 2010
A dynamic ColumnFamilyFriday, December 10, 2010
SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?)          followers               ...
SuperColumns = full denormalizationFriday, December 10, 2010
A little deeper                    • http://twissandra.com                    • http://github.com/jhermes/twissjavaFriday,...
Secondary indexesFriday, December 10, 2010
A static ColumnFamilyFriday, December 10, 2010
demo time                    • Reading the slides after the talk?   See http://                            www.riptano.com...
Hadoop OutputFormatjob.setOutputFormatClass(ColumnFamilyOutputFormat.class);ConfigHelper.setOutputColumnFamily(job.getConf...
Large rows                    • 0.6: smaller of {2GB, memory limit}                    • 0.7: in_memory_compaction_limit_i...
NetworkTopologyStrategy                    • RackAwareStrategy is tuned for 3 replicas                            and 2 da...
OperationsFriday, December 10, 2010
Efficient Streaming                    • The following slides show how in 0.7, we                            just send the ...
W           A                                                F                                    (A-L]                   ...
W           A                                                 F                                    (A-F]                  ...
W            A                                                 F                                    Data                  ...
W            A                                                 F                            T                             ...
Per-CF memtable thresholds          • Easier tuning for large numbers of ColumnFamiliesFriday, December 10, 2010
Column Metadata                    • 0.6: comparator, subcomparator                    • 0.7: default_validation_class,   ...
Native code                    • JNA introduced in 0.6.5 for mlockall                    • Extended to hard links in 0.6.6...
Flow Control (0.6.4)                    • Replica nodes drop hopeless requests on                            the floor     ...
Flow control in 0.5                    • Why backpressure doesn’t fit CassandraFriday, December 10, 2010
Dynamic snitchpublic void sortByProximity(List<InetAddress> addresses);Friday, December 10, 2010
Everything elseFriday, December 10, 2010
0.7 performance                    • Reads roughly 100% faster, thanks largely to                            removing Stri...
Thrift: the libpq of Cassandra                    • OOMs on malformed packets                    • Python Unicode string i...
Client support from Riptano                    • Hector                            •   Building JPA/JDO layer on top      ...
After 0.7.0                    • IndexOperator.GT                    • Triggers / plugins                    • Entity grou...
SummaryFriday, December 10, 2010
Friday, December 10, 2010
Upcoming SlideShare
Loading in …5
×

Cassandra 0.7, Los Angeles High Scalability Group

2,489 views

Published on

What's new in Cassandra 0.7

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,489
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Cassandra 0.7, Los Angeles High Scalability Group

  1. 1. Cassandra 0.7Friday, December 10, 2010
  2. 2. Features • Live schema modification • Secondary indexes • Hadoop OutputFormat • (Very) large rows • up to 2 billion columns • NetworkTopologyStrategyFriday, December 10, 2010
  3. 3. Operations • efficient Streaming • Per-ColumnFamily settings of memtable thresholds • Much more (optional) metadata about columnsFriday, December 10, 2010
  4. 4. Operations backports • HH disable (0.6.2) • compaction priority (0.6.3) • HH hourly scan (0.6.3) • JMX metrics for row-level bloom filters (0.6.3) • Flow control (0.6.4, 5) • HH paging (0.6.5) • Dynamic snitch (0.6.5) • Tombstone removal in minor compaction (0.6.6)Friday, December 10, 2010
  5. 5. Compatiblity • Fully backwards-compatible with 0.6 data • Some Thrift API changes • String row keys become byte[] • keyspace is set once per connection • Requires drain + cluster restartFriday, December 10, 2010
  6. 6. FeaturesFriday, December 10, 2010
  7. 7. Live schema changes • Details: http://www.riptano.com/blog/live- schema-updates-cassandra-07Friday, December 10, 2010
  8. 8. Data model tradeoffs • Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.”Friday, December 10, 2010
  9. 9. A static ColumnFamilyFriday, December 10, 2010
  10. 10. Friday, December 10, 2010
  11. 11. A dynamic ColumnFamilyFriday, December 10, 2010
  12. 12. SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?) followers timeline ? ? tweetsFriday, December 10, 2010
  13. 13. SuperColumns = full denormalizationFriday, December 10, 2010
  14. 14. A little deeper • http://twissandra.com • http://github.com/jhermes/twissjavaFriday, December 10, 2010
  15. 15. Secondary indexesFriday, December 10, 2010
  16. 16. A static ColumnFamilyFriday, December 10, 2010
  17. 17. demo time • Reading the slides after the talk? See http:// www.riptano.com/blog/whats-new- cassandra-07-secondary-indexesFriday, December 10, 2010
  18. 18. Hadoop OutputFormatjob.setOutputFormatClass(ColumnFamilyOutputFormat.class);ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KS, CF);...public void reduce(Text word, Iterable<IntWritable> values, Contextcontext){ int sum = 0; for (IntWritable val : values) sum += val.get(); context.write(outputKey, Collections.singletonList(getMutation(word, sum)));}Friday, December 10, 2010
  19. 19. Large rows • 0.6: smaller of {2GB, memory limit} • 0.7: in_memory_compaction_limit_in_mbFriday, December 10, 2010
  20. 20. NetworkTopologyStrategy • RackAwareStrategy is tuned for 3 replicas and 2 data centers • renamed to OldNetworkTopologyStrategy • NTS allows configuring replicas per data center, per Keyspace • ignores replication_factor directiveFriday, December 10, 2010
  21. 21. OperationsFriday, December 10, 2010
  22. 22. Efficient Streaming • The following slides show how in 0.7, we just send the data portion of the sstables we are moving to a new node over to it (which is contiguous on disk, no random i/ o), which rebuilds indexes etc • This minimizes the impact on existing nodesFriday, December 10, 2010
  23. 23. W A F (A-L] T LFriday, December 10, 2010
  24. 24. W A F (A-F] (A-F] T (F-L] LFriday, December 10, 2010
  25. 25. W A F Data T L Index FilterFriday, December 10, 2010
  26. 26. W A F T L Index FilterFriday, December 10, 2010
  27. 27. Per-CF memtable thresholds • Easier tuning for large numbers of ColumnFamiliesFriday, December 10, 2010
  28. 28. Column Metadata • 0.6: comparator, subcomparator • 0.7: default_validation_class, column_metadataFriday, December 10, 2010
  29. 29. Native code • JNA introduced in 0.6.5 for mlockall • Extended to hard links in 0.6.6Friday, December 10, 2010
  30. 30. Flow Control (0.6.4) • Replica nodes drop hopeless requests on the floor • Coordinator node is unaffected • TimedOutException signals client to back off • Requires enough memory to buffer RPCTimeout’s worth of requests • (In the short term, you’re still screwed)Friday, December 10, 2010
  31. 31. Flow control in 0.5 • Why backpressure doesn’t fit CassandraFriday, December 10, 2010
  32. 32. Dynamic snitchpublic void sortByProximity(List<InetAddress> addresses);Friday, December 10, 2010
  33. 33. Everything elseFriday, December 10, 2010
  34. 34. 0.7 performance • Reads roughly 100% faster, thanks largely to removing String creation • Row-cached reads up to 8x faster after optimizations by tjake and jbellis • Optimizations for reads of large rows • 0.7.1? ~15% improvement everywhere from ByteBuffer optimizationsFriday, December 10, 2010
  35. 35. Thrift: the libpq of Cassandra • OOMs on malformed packets • Python Unicode string issues • PHP support is buggy and maintainerlessFriday, December 10, 2010
  36. 36. Client support from Riptano • Hector • Building JPA/JDO layer on top • pycassa • phpcassa • Soon: cassandra gemFriday, December 10, 2010
  37. 37. After 0.7.0 • IndexOperator.GT • Triggers / plugins • Entity groups • On-disk data format improvements (Compression, compound keys?)Friday, December 10, 2010
  38. 38. SummaryFriday, December 10, 2010
  39. 39. Friday, December 10, 2010

×