State of Cassandra, August 2010

Professional Cassandra support and services

Tuesday, August 10, 2010

Cassandra: Present & Future
Jonathan Ellis
@spyced


Cassandra 0.6 & 0.7
Jonathan Ellis
@spyced


Quiet change of policy

• 0.5.1 was bug ﬁxes only
• Too early to be strict about bugﬁx-only
policy in stable branch, especially w/ 0.7
being longer/more break-y
• Maybe after 1.0?


1500
mails sent

1125

750

375

0
Jan Feb Apr May Jun Jul
(0.5) (0.5.1) Mar (0.6, 0.6.1) (0.6.2) (0.6.3) (0.6.4)


Lots of bug ﬁxes

• 85 issues marked Resolved/Fixed in 0.6
branch after 0.6 released


Runtime conﬁguration

• concurrent reads, writes (0.6.2)
• making it easier to bandage your foot after you
shoot it

• PhiConvictThreshold (0.6.2)


Performance

• JVM GC defaults (0.6.2)
• Faster commitlog (0.6.2)
• Faster range slice, Hadoop jobs (0.6.1, 2)
• Better parallelization of multiget (0.6.4)
• UTF8Type, UUIDType optimizations (0.6.5)


Bulletprooﬁng
• HH disable (0.6.2)

• compaction priority (0.6.3)

• HH hourly scan (0.6.3)

• JMX metrics for row-level bloom ﬁlters (0.6.3)

• Flow control (0.6.4, 5)

• HH paging (0.6.5)

• Dynamic snitch (0.6.5)


Hinted Handoff
• 0.6.0: send hints to natural replicas

• 0.6.0: ﬁx row-level concurrency bottleneck

• 0.6.2: option to disable entirely

• 0.6.3: remove hourly scan

• 0.6.4: lower priority

• 0.6.5: paging of large hinted rows

• 0.7.0: large rows


Why keep HH around?

https://www.cloudkick.com/blog/2010/jan/12/visual-ec2-latency/


Compaction priority

-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Dcassandra.compaction.priority=1

Extended to HH in 0.6.4


http://www.javamex.com/tutorials/threads/priority_what.shtml


JMX for bloom ﬁlters

• o.a.c.db:ColumnFamilyStores
• getBloomFilterFalsePositives

• [not in nodetool yet]


Flow control in 0.5

• Why backpressure doesn’t ﬁt Cassandra


Flow Control in 0.6.4
• Replica nodes drop hopeless requests on
the ﬂoor
• Coordinator node is unaffected

• TimedOutException signals client to back off

• Requires enough memory to buffer
RPCTimeout’s worth of requests

• (In the short term, you’re still screwed)

Flow Control, 0.6.4
IncomingTcpConnection

Message Deserializer Uncapped

Read Mutation Capped at 4096



Message Deserializer

Read Gossip Mutation


Flow Control, 0.6.5

Read Gossip Mutation Uncapped


Dynamic snitch

• sortByProximity


Open problems

• Linux/mmap/swap unholy trio (0.6.5)
• Memory fragmentation (0.6.5?)
• Compaction effect on caches (0.7.1?)


mmap and swap
• The problem
• Mitigations
• mmap_index_only

• swappiness=0

• turn off swap

• mlockall at startup (Xms=Xmx)

GC Fragmentation

• Culprit of infamous CASSANDRA-1014?
• Mitigation: tune with much larger new
generation / tenuring threshold?


Compaction and caches

• Compactions wrecks the OS fs cache
• Wrecks Cassandra key cache, too
• (but not row cache)


New in 0.7

• live schema changes
• large rows
• secondary indexes
• efﬁcient Streaming
• DatacenterStrategy


Live schema changes

• Details: http://www.riptano.com/blog/live-
schema-updates-cassandra-07


Large rows

• 0.6: smaller of {2GB, memory limit}
• 0.7: in_memory_compaction_limit_in_mb


Secondary indexes


Streaming in 0.6
W A

F
(A-L]

T

L


W A

F
(A-F]

(A-F]
T
(F-L]
L


W A

F

Data
T

L
Index
Filter

Streaming in 0.7
W A

F

T

L
Index
Filter

DatacenterStrategy

• RackAwareStrategy is tuned for 3 replicas
and 2 data centers
• DS allows conﬁguring replicas per data
center, per Keyspace


Minor features in 0.7

• read_repair_chance
• per-keyspace request scheduling
• Hadoop OutputFormat
• Per CF what used to be global
(gc_grace_seconds, memtable thresholds)


0.7 API changes

• String keys become byte[]
• Thrift keyspace argument moved to
set_keyspace
• i64 timestamp becomes Clock
• SlicePredicate for _count methods


0.7 performance
• Reads roughly 100% faster, thanks largely to
removing String creation
• Row-cached reads up to 8x faster after
optimizations by tjake and jbellis
• Optimizations for reads of large rows
• 0.7.1? ~20% improvement everywhere from
Thrift optimizations


Thrift

• OOMs on malformed packets
• Python Unicode string issues
• PHP support is buggy and maintainerless


After 0.7.0
• IndexOperator.GT
• Triggers / plugins
• Avro?
• On-disk data format improvements
(Compression, heirarchical data?)
• Auth

Questions


State of Cassandra, August 2010

Recommended

Recommended

More Related Content

Similar to State of Cassandra, August 2010

Similar to State of Cassandra, August 2010 (20)

More from jbellis

More from jbellis (20)

Recently uploaded

Recently uploaded (20)

State of Cassandra, August 2010