Cassandra Anti-Patterns (in 5m)Matthew F. Dennis // @mdennis
Non-Sun (err, Non-Oracle) JVM●   No OpenJDK●   No Blackdown (anyone still use this?)●   Etc, etc, etc; just use the Sun (O...
CommitLog+Data On The Same Disk●   Dont put the commit log and data directories on    the same set of spindles        –   ...
EBS volumes on EC2●   Sounds great, nice feature set, but …    ●   Not predictable    ●   “freezes” are common    ●   Thro...
Oversized JVM heaps●   6 – 8 GB is good (assuming sufficient ram on    your boxen)●   10 – 12 GB is possible and in some  ...
JVM heap size -v- GC suckage                                     ~16GB GC Suckage                        ~10GB            ...
Large batch mutations             (large in number of distinct rows)●   Timeout / failure => entire mutation must be    re...
OPP / BOP partitioner●   You probably shouldnt use it    ●   No really, you almost certainly shouldnt use it●   Creates ho...
C* auto selection of tokens●   Always specify your initial token.●   Auto select doesnt do what you think it does    nor d...
Super Columns●   10 – 15 percent performance penalty on reads and writes●   Easier / better to use to composite columns   ...
Read Before Write●   Race conditions●   Abuses/Thrashes cache (row, key and page)●   Increases latency●   Increases IO req...
Winblows●   Try to avoid it, youll be happier        –   Not always possible? Then, “Im sorry for your pain”●   Run nix (i...
Q?  Cassandra Anti-PatternsMatthew F. Dennis // @mdennis
Upcoming SlideShare
Loading in...5
×

Cassandra Anti-Patterns

7,411

Published on

short lighting talk on Apache Cassandra Anti-Patterns from Cassandra SF 2011

0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,411
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
185
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

Cassandra Anti-Patterns

  1. 1. Cassandra Anti-Patterns (in 5m)Matthew F. Dennis // @mdennis
  2. 2. Non-Sun (err, Non-Oracle) JVM● No OpenJDK● No Blackdown (anyone still use this?)● Etc, etc, etc; just use the Sun (Oracle) JVM● At least u22, but in general the latest release (unless you have specific reasons otherwise)
  3. 3. CommitLog+Data On The Same Disk● Dont put the commit log and data directories on the same set of spindles – commit log gets a single spindle entirely to itself (standard consumer SATA disks easily sustain > 80 MB/s in sequential writes)● DOES NOT APPLY TO SSDS or EC2 ● SSDs have no seek time ● EC2 ephemeral drives are still virtualized (but not the same as EBS) ● On EC2 or SSDs: use one RAID set for both the commit log and data directories
  4. 4. EBS volumes on EC2● Sounds great, nice feature set, but … ● Not predictable ● “freezes” are common ● Throughput limited in many cases● Use ephemeral drives instead ● Stripe them ● Both commit log and data directory on the same raid set
  5. 5. Oversized JVM heaps● 6 – 8 GB is good (assuming sufficient ram on your boxen)● 10 – 12 GB is possible and in some circumstances “correct”● 16GB == max JVM heap size● > 16GB => badness● JVM heap ~= boxen RAM => badness (always)
  6. 6. JVM heap size -v- GC suckage ~16GB GC Suckage ~10GB ~6GB JVM heap size
  7. 7. Large batch mutations (large in number of distinct rows)● Timeout / failure => entire mutation must be retried => wasted work● Larger mutations => higher likely hood of timehood● 1000 mutations to perform? Do 100 batches of 10 in parallel instead of one batch of 1000● Exact number or rows/batch is variable depending on HW, network, load, etc; experiment! (10-100 is a good starting point)
  8. 8. OPP / BOP partitioner● You probably shouldnt use it ● No really, you almost certainly shouldnt use it● Creates hot spots● Requires “baby sitting” from ops● Not as well tested nor is it widely deployed
  9. 9. C* auto selection of tokens● Always specify your initial token.● Auto select doesnt do what you think it does nor does it do what you want – loadbalance is even worse, it doesnt currently do what you think, what you want or what it claims; “F#@* my cluster” would be a much more apt name than “loadbalance” – Future (next?) release of OPSC will remove your balancing woes
  10. 10. Super Columns● 10 – 15 percent performance penalty on reads and writes● Easier / better to use to composite columns – 0.8.x makes this a lot easier – Done manually in 0.7.x and is still better● Devs working in C* code despise (loathe?) them● API probably wont be deprecated, but implementation will be replaced behind the seen with composites (may be “ok” at that point to use them, but should probably just use composite API direclty)● Cassandra and DataStax is committed to maintain the API going forward, even if the implementation changes
  11. 11. Read Before Write● Race conditions● Abuses/Thrashes cache (row, key and page)● Increases latency● Increases IO requirements (by a lot)● Increases size in the client
  12. 12. Winblows● Try to avoid it, youll be happier – Not always possible? Then, “Im sorry for your pain”● Run nix (in particular, probably Linux) ● Easier to get help (IRC, email, meetups, etc) ● C* performs better ● Better tested ● Cheaper ● Wider deployed (by a lot)
  13. 13. Q? Cassandra Anti-PatternsMatthew F. Dennis // @mdennis
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×