Cassandra Compression and Performance Evaluation

9,309 views

Published on

Even though we had abandoned the Cassandra in all our products, we would like to share our works here.

Why we abandoned the Cassandra in our products? Because:
(1) It is a big wrong in Cassandra's implementation, especially on it's local storage engine layer, i.e. SSTable and Indexing.
(2) It is a big wrong to combine Bigtable and Dynamo. Dynamo's hash ring architecture is a obsolete technolohy for scale, it's consistency and replication policy is also unusable in big data storage.

Published in: Technology, Art & Photos
2 Comments
4 Likes
Statistics
Notes
  • Even though we had abandoned the Cassandra in all our products, we would like to share our works here.

    Why we abandoned the Cassandra in our products? Because:
    (1) It is a big wrong in Cassandra’s implementation, especially on it’s local storage engine layer, i.e. SSTable and Indexing.
    (2) It is a big wrong to combine Bigtable and Dynamo. Dynamo’s hash ring architecture is a obsolete technolohy for scale, it’s consistency and replication policy is also unusable in big data storage.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Our implementation base on 0.6.x.

    In fact, in our codebase, finally we place the indexes before the column blocks, to avoid too many seeks.
    Current SSTable implementation of cassandra is pool to support flexible data size in a row, because all columns must be loaded into memory.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
9,309
On SlideShare
0
From Embeds
0
Number of Embeds
2,780
Actions
Shares
0
Downloads
0
Comments
2
Likes
4
Embeds 0
No embeds

No notes for slide

Cassandra Compression and Performance Evaluation

  1. 1. Cassandra Performance Evaluation with Compression Schubert Zhang, May.2010 schubert.zhang@gmail.comThe current implementation of Cassandra’s storage layer and indexing mechanismonly allow compression at row level. Column Family Row Serialization Structure:1. The old structure: Len HashCount bloom filter BitSet (int) (int) index size int FirstColumName LastColumName Offset (long) Block Width index of block 0 (Len(short)+name) (Len(short)+name) (0 for first block) (long) index of block 1 localDeletionTime markedForDeleteAt deletion meta (int) (long) column count int column block 0 (uncompressed) Column0 Column1 Column2 Column3 column block 1 (uncompressed) deleteMark timestamp value (bool) (long) (byte[])2. The new structure (to support compression)The new structure is appropriate for both old (uncompressed) and new (compressed)format. format (int): -1 (old format), 0 (new, LZO compressed), 1(new, GZ compressed), 2(new, uncompressed) Len HashCount bloom filter BitSet (int) (int) localDeletionTime markedForDeleteAt deletion meta (int) (long) column count int column block 0 (compressed or not) Column0 Column1 Column2 Column3 column block 1 (compressed or not) deleteMark timestamp value (bool) (long) (byte[]) index size int FirstColumName LastColumName Offset (long) Block Width Size on Disk index of block 0 (Len(short)+name) (Len(short)+name) (0 for first block) (long) (int) index of block 1 index size’If the first int (format) is -1, the following structure will be same as “Theold structure”, except the “index of block” will use the new one. 1
  2. 2.  Benchmark:1. Just one single node (only one disk, 4GB RAM(3GB for JVM heap), 4 Cores)2. Dataset: ~200 bytes per column (thrift compactly encoded, the original CSV string is~250 bytes) 100,000 keys 500,000,000 columns totally ~5,000 columns per key in average3. Key Cache and Row Cache both disabled4. Write or Read Client has 4 Threads, totally execute 10,000 read operations.5. Every read operation only read the first 100 columns of the specified key.5. The read performance is got after major compaction, i.e. only one SSTable. Compression Performance Matrix:Field Model Uncompressed Compressed Compressed Criteria (Default) (GZ) (LZO) Size Disk Space(B) 104.545GB 45.067GB 54.656GB Compression Ratio 1/1 1/2.3 1/1.9Compact Major Time(H) 3:16 5:30 3:08 Row Max Size(B) 1186948 512475 624396 Write Throughput(ops/s) 12635 11806 11034 Avg Latency(ms) 0.320 0.334 0.347 Min Latency(ms) 0.079 0.083 0.089 Max Latency(ms) 19331 5128 10227 Local Latency(ms) 0.032 0.033 0.037 Read Throughput(ops/s) 25 28 25 Avg Latency(ms) 159 144 159 Min Latency(ms) 1 2 1 Max Latency(ms) 1038 1526 619 Local Latency(ms) 159 144 159Note:1. The bottleneck of Write is CPU and memory. a) In theory, we may get better performance under more power CPU and more RAM. b) And if the commitlog is stored on a dedicated disk, we may get better result.2. The bottleneck of Read is disk utility (100%). a) Too many seeks. b) Every read need 2 seeks to reach the row. So, a read operation needs at least 20ms on disk seek. The maximum throughput (ops/s) is 50. c) If the row is compressed, one additional seek in the row is needed.3. The compression ratio will become better along with the average size of row. a) Since our dataset are very random, the ratio is just about 1/2.4. Compaction is CPU-bound, since compaction is single-threaded. Gzip compression is slower. 2
  3. 3. Configuration:Parameter ValueKeysCached 0DiskAccessMode standardSlicedBufferSizeInKB 64FlushDataBufferSizeInMB 32FlushIndexBufferSizeInMB 8ColumnIndexSizeInKB 64MemtableThroughputInMB 128ConcurrentReads 16ConcurrentWrites 64CommitLogSync periodicCommitLogSyncPeriodInMS 10000 Encoding + Compression:1. The original text CSV column: ~250 bytes2. Use thrift compacted encoding: ~200 bytes3. Encoding + Compression, compositive reduce ratio: ~1/3 Read Throughput/Latency on slice size (count of columns): Test on LZO compressed data, totally executed 10,000 read operations. Slice Size 50 500 5000CriteriaThroughput(ops/s) 25 21 15 Avg Latency(ms) 158.865 186.571 256.837 Min Latency(ms) 1.278 5.041 60.934 Max Latency(ms) 288.307 395.427 1223.202 3
  4. 4. Read Throughput 30 Throughput(ops/s) 25 25 20 21 15 15 10 5 0 50 500 5000 Slice Size(Count of Columns) Read Latency 1400 1200 1223.202 Latency(ms) 1000 Avg Latency(ms) 800 Min Latency(ms) 600 Max Latency(ms) 400 395.427 288.307 256.837 200 186.571 158.865 0 5.041 1.278 60.934 50 500 5000 Slice Size(Count of Columns) Read Throughput/Latency on KeyCache, mmap, etc: Test on LZO compressed data, use the benchmark. Totally executed 10,000 readoperations. Feature KeyCache=100% KeyCache=0 KeyCache=0 DiskAccess=standard DiskAccess= DiskAccess=mmapCriteria mmap_index_onlyThroughput(ops/s) 40 40 84 Avg Latency(ms) 100.522 101.762 47.342 Min Latency(ms) 1.566 1.453 1.270 Max Latency(ms) 278.975 267.120 239.816 90 84 80 Throughput(ops/s) 70 60 50 40 40 40 30 20 10 0 KeyCache_standard mmap_index_only mmapBut, for a long time of evaluation, the performance of on mmap is unstable. Followingevaluation executed 1000,000 read operations. It may because of GC. 4
  5. 5. Read Throughput (mmap)Throughput(ops/s) 120 100 80 60 40 20 0 1 15 29 43 57 71 85 99 113 127 141 155 169 183 197 211 225 239 253 267 281 Time( 1minute) 5

×