• Like
Cassandra 0.7, Los Angeles High Scalability Group
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Cassandra 0.7, Los Angeles High Scalability Group

  • 1,913 views
Published

What's new in Cassandra 0.7

What's new in Cassandra 0.7

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,913
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
15
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Cassandra 0.7Friday, December 10, 2010
  • 2. Features • Live schema modification • Secondary indexes • Hadoop OutputFormat • (Very) large rows • up to 2 billion columns • NetworkTopologyStrategyFriday, December 10, 2010
  • 3. Operations • efficient Streaming • Per-ColumnFamily settings of memtable thresholds • Much more (optional) metadata about columnsFriday, December 10, 2010
  • 4. Operations backports • HH disable (0.6.2) • compaction priority (0.6.3) • HH hourly scan (0.6.3) • JMX metrics for row-level bloom filters (0.6.3) • Flow control (0.6.4, 5) • HH paging (0.6.5) • Dynamic snitch (0.6.5) • Tombstone removal in minor compaction (0.6.6)Friday, December 10, 2010
  • 5. Compatiblity • Fully backwards-compatible with 0.6 data • Some Thrift API changes • String row keys become byte[] • keyspace is set once per connection • Requires drain + cluster restartFriday, December 10, 2010
  • 6. FeaturesFriday, December 10, 2010
  • 7. Live schema changes • Details: http://www.riptano.com/blog/live- schema-updates-cassandra-07Friday, December 10, 2010
  • 8. Data model tradeoffs • Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.”Friday, December 10, 2010
  • 9. A static ColumnFamilyFriday, December 10, 2010
  • 10. Friday, December 10, 2010
  • 11. A dynamic ColumnFamilyFriday, December 10, 2010
  • 12. SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?) followers timeline ? ? tweetsFriday, December 10, 2010
  • 13. SuperColumns = full denormalizationFriday, December 10, 2010
  • 14. A little deeper • http://twissandra.com • http://github.com/jhermes/twissjavaFriday, December 10, 2010
  • 15. Secondary indexesFriday, December 10, 2010
  • 16. A static ColumnFamilyFriday, December 10, 2010
  • 17. demo time • Reading the slides after the talk? See http:// www.riptano.com/blog/whats-new- cassandra-07-secondary-indexesFriday, December 10, 2010
  • 18. Hadoop OutputFormatjob.setOutputFormatClass(ColumnFamilyOutputFormat.class);ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KS, CF);...public void reduce(Text word, Iterable<IntWritable> values, Contextcontext){ int sum = 0; for (IntWritable val : values) sum += val.get(); context.write(outputKey, Collections.singletonList(getMutation(word, sum)));}Friday, December 10, 2010
  • 19. Large rows • 0.6: smaller of {2GB, memory limit} • 0.7: in_memory_compaction_limit_in_mbFriday, December 10, 2010
  • 20. NetworkTopologyStrategy • RackAwareStrategy is tuned for 3 replicas and 2 data centers • renamed to OldNetworkTopologyStrategy • NTS allows configuring replicas per data center, per Keyspace • ignores replication_factor directiveFriday, December 10, 2010
  • 21. OperationsFriday, December 10, 2010
  • 22. Efficient Streaming • The following slides show how in 0.7, we just send the data portion of the sstables we are moving to a new node over to it (which is contiguous on disk, no random i/ o), which rebuilds indexes etc • This minimizes the impact on existing nodesFriday, December 10, 2010
  • 23. W A F (A-L] T LFriday, December 10, 2010
  • 24. W A F (A-F] (A-F] T (F-L] LFriday, December 10, 2010
  • 25. W A F Data T L Index FilterFriday, December 10, 2010
  • 26. W A F T L Index FilterFriday, December 10, 2010
  • 27. Per-CF memtable thresholds • Easier tuning for large numbers of ColumnFamiliesFriday, December 10, 2010
  • 28. Column Metadata • 0.6: comparator, subcomparator • 0.7: default_validation_class, column_metadataFriday, December 10, 2010
  • 29. Native code • JNA introduced in 0.6.5 for mlockall • Extended to hard links in 0.6.6Friday, December 10, 2010
  • 30. Flow Control (0.6.4) • Replica nodes drop hopeless requests on the floor • Coordinator node is unaffected • TimedOutException signals client to back off • Requires enough memory to buffer RPCTimeout’s worth of requests • (In the short term, you’re still screwed)Friday, December 10, 2010
  • 31. Flow control in 0.5 • Why backpressure doesn’t fit CassandraFriday, December 10, 2010
  • 32. Dynamic snitchpublic void sortByProximity(List<InetAddress> addresses);Friday, December 10, 2010
  • 33. Everything elseFriday, December 10, 2010
  • 34. 0.7 performance • Reads roughly 100% faster, thanks largely to removing String creation • Row-cached reads up to 8x faster after optimizations by tjake and jbellis • Optimizations for reads of large rows • 0.7.1? ~15% improvement everywhere from ByteBuffer optimizationsFriday, December 10, 2010
  • 35. Thrift: the libpq of Cassandra • OOMs on malformed packets • Python Unicode string issues • PHP support is buggy and maintainerlessFriday, December 10, 2010
  • 36. Client support from Riptano • Hector • Building JPA/JDO layer on top • pycassa • phpcassa • Soon: cassandra gemFriday, December 10, 2010
  • 37. After 0.7.0 • IndexOperator.GT • Triggers / plugins • Entity groups • On-disk data format improvements (Compression, compound keys?)Friday, December 10, 2010
  • 38. SummaryFriday, December 10, 2010
  • 39. Friday, December 10, 2010