Large partition in Cassandra

Large partition in Cassandra
Shogo Hoshii
Yahoo! Japan Corp.

About me
• Cassandra operator atYahoo! Japan Corp.
• https://issues.apache.org/jira/browse/CASSA
NDRA-5977

remark
• This is a summary of following tickets:
– https://issues.apache.org/jira/browse/CASSANDR
A-11206
A-9738

Agenda
• Recap the read path
• What’s the problem?
• Solutions

High level: read path
Row Cache
Key Cache
SSTables MemTable
1. Check row cache before going to key cache
2. Check the key cache to get the
offsets to data
3. Find the offsets to data and retrieve data
4. Merge data from sstables and memtable
5. Populate row cache with new row returned
http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlAboutReads.html

Pattern 1.The row is in row cache
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. return row when that is in row cache

Pattern 2.The key is in key cache
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Check bloom filters 3. Check the partition key is in key cache
4. Find the offset to the result set
5. Access the result set

Pattern 3.The key is not cached
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offset into the result set
8. Update key cache

What’s the problem?
• GC pressure by key cache when a large
partition is read

Partition Index Recap
• http://distributeddatastore.blogspot.jp/2013/08/cassandra-sstable-storage-format.html

RowIndexEntry
• Partition size < 64 kb
– RowIndexEntry
• Position
• Seriarized size of data
• Partition size > 64 kb
– IndexedEntry
• Position
• Seriarized size of data
• IndexInfo[]
– Seriarize method
– Offset
– width
– Etc.
Approximation on 16 byte value
1mb : 3kb / > 200 objects
4mb : 11kb / > 800 objects
64mb : 180kb / > 13k objects
512mb : 1.4mb / > 106k objects

3.The key is not cached
Partition
Summary
Disk
MemTable
Compression
Offsets
Bloom Filter
Row Cache
Heap Off Heap
Key Cache
Partition
Index
Data
1. read request
2. Miss -> Check bloom filters
3. Check the partition key is in key cache
4. Miss -> Bsearch the close location of index
5. Disk scan to find the offsets 6. Find the offsets into the result set
8. Update key cache
9. GC, GC, GC…

Current solution
• If partition size <
column_index_cache_size_in_kb(configurable)
– IndexedEntry is kept on heap
• Otherwise
– Always read from disk when needed
• https://issues.apache.org/jira/browse/CASSANDRA-11206
• https://www.youtube.com/watch?v=qa84vABqftM

Other possible solutions
• IndexInfo never be kept on heap
– Read from disk when needed
– degrades performance when small partition is
read

Other possible solutions
• Migrate key cache to be fully off heap
A-9738
– Serialization & deserialization cost so much when
large partition is read
• Will Birch help us to solve this problem?
– https://issues.apache.org/jira/browse/CASSANDRA-9754

Large partition in Cassandra

More Related Content

What's hot

Similar to Large partition in Cassandra

Recently uploaded

Large partition in Cassandra

Editor's Notes