Your SlideShare is downloading. ×
Bay area Cassandra Meetup 2011
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bay area Cassandra Meetup 2011

1,543

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,543
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Cassandra 0.7
    • 2. Features• Live schema modification• Secondary indexes• Hadoop OutputFormat• (Very) large rows • up to 2 billion columns• NetworkTopologyStrategy
    • 3. Operations• efficient Streaming• Per-ColumnFamily settings of memtable thresholds• Much more (optional) metadata about columns
    • 4. Operations backports• HH disable (0.6.2)• compaction priority (0.6.3)• HH hourly scan (0.6.3)• JMX metrics for row-level bloom filters (0.6.3)• Flow control (0.6.4, 5)• HH paging (0.6.5)• Dynamic snitch (0.6.5)• Tombstone removal in minor compaction (0.6.6)
    • 5. Compatiblity• Fully backwards-compatible with 0.6 data• Some Thrift API changes • String row keys become byte[] • keyspace is set once per connection• Requires drain + cluster restart
    • 6. Features
    • 7. Live schema changes• Details: http://www.riptano.com/blog/live- schema-updates-cassandra-07
    • 8. Data model tradeoffs• Twitter: “Fifteen months ago, it took two weeks to perform ALTER TABLE on the statuses [tweets] table.”
    • 9. A static ColumnFamily
    • 10. A dynamic ColumnFamily
    • 11. SELECT * FROM tweetsWHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?) followers timeline ? ? tweets
    • 12. SuperColumns = full denormalization
    • 13. A little deeper• http://twissandra.com• http://github.com/jhermes/twissjava
    • 14. Secondary indexes
    • 15. A static ColumnFamily
    • 16. demo time
    • 17. Hadoop OutputFormatjob.setOutputFormatClass(ColumnFamilyOutputFormat.class);ConfigHelper.setOutputColumnFamily(job.getConfiguration(), KS, CF);...public void reduce(Text word, Iterable<IntWritable> values, Contextcontext){ int sum = 0; for (IntWritable val : values) sum += val.get(); context.write(outputKey,Collections.singletonList(getMutation(word, sum)));}
    • 18. Large rows• 0.6: smaller of {2GB, memory limit}• 0.7: 2 billion columns • in_memory_compaction_limit_in_mb
    • 19. NetworkTopologyStrategy• RackAwareStrategy is tuned for 3 replicas and 2 data centers • renamed to OldNetworkTopologyStrategy• NTS allows configuring replicas per data center, per Keyspace • ignores replication_factor directive
    • 20. Operations
    • 21. Efficient Streaming
    • 22. W A F (A-L]T L
    • 23. W A F (A-F] (A-F]T (F-L] L
    • 24. W A F DataT L Index Filter
    • 25. W A FT L Index Filter
    • 26. Per-CF memtable thresholds• Easier tuning for large numbers of ColumnFamilies
    • 27. Column Metadata• 0.6: comparator, subcomparator• 0.7: default_validation_class, column_metadata
    • 28. Native code• JNA introduced in 0.6.5 for mlockall• Extended to hard links in 0.6.6 for snapshots• 0.7.1: posix_fadvise / fcntl for writes
    • 29. Flow Control (0.6.4)• Replica nodes drop hopeless requests on the floor • Coordinator node is unaffected • TimedOutException signals client to back off • Requires enough memory to buffer RPCTimeout’s worth of requests• (In the short term, you’re still screwed)
    • 30. Flow control in 0.5• Why backpressure doesn’t fit Cassandra
    • 31. Dynamic snitchpublic void sortByProximity(List<InetAddress> addresses);
    • 32. Everything else
    • 33. 0.7 performance• Reads roughly 100% faster, thanks largely to removing String creation• Row-cached reads up to 8x faster after optimizations by tjake and jbellis• Optimizations for reads of large rows• 0.7.1: even more speed (fix nagle + delayed acks, zero-copy reads, ...)
    • 34. Thrift: the libpq of Cassandra• OOMs on malformed packets• Python Unicode string issues• PHP support is buggy and maintainerless
    • 35. Client support from Riptano• Hector • Building JPA/JDO layer on top• pycassa• phpcassa• Soon: cassandra gem
    • 36. After 0.7.0• Distributed counters• IndexOperator.GT• Entity groups• 1.0?

    ×