Cassandra under the       hood         Richard Low      rlow@acunu.com
Outline• What happens when you write? • Commit logs • Memtables                       “richard”:{                         ...
Why should we care?• Help understand performance• Understand performance implications of  data model• Helps to fix it if so...
Writes
Writes (2)             InsertCommit log            Memtable                      SSTable             { Bloom filter, Index,...
Commit log             InsertCommit log            Memtable                      SSTable             { Bloom filter, Index,...
Commit log• Each insert written to commit log first• Stored in insertion order• Inserts not acknowledged until written to  ...
Memtable    Memtable
Memtable• In memory store of insertions• ConcurrentSkipListMap• When too large, flushed to disk• Ensures all writes to disk...
SSTable             InsertCommit log            Memtable                      SSTable             { Bloom filter, Index, Da...
SSTables• Stores actual data, sorted by key• Contains a Bloom filter and index to help  find keys• Read only
Bloom filters• Probabilistic data structure• Answers membership queries: • ‘Does the set contain x?’• Can give false positi...
How it works together   Bloom filter            Index                                Data011010111010010   k_0       ->    ...
How it works together   Bloom filter             Index                                Data011010111010010   k_0        ->  ...
How it works together                                   Memory   Disk   Bloom filter             Index                     ...
Point queriesMemtables   k_0            k_1                         ->                         ->                         ...
Point queriesMemtables        k_0                 k_1                              ->                              ->     ...
Point queriesMemtables          k_0                   k_1                                ->                               ...
Point queriesMemtables          k_0                   k_1                                ->                               ...
Range queries• Bloom filters useless• Use index to locate portion of SSTable• Read data, merge results• Necessary to lookup...
Compaction• Merges SSTables• Removes overwrites and obsolete  tombstones• Improves range query performance• Major compacti...
Write optimised• All writes are sequential on disk• Each write is written multiple times during  compactions• Bloom filters...
Scaling• In memory: • Buffers • Memtables • Bloom filters • Index• If not enough memory, significant  performance impact
Repair: Merkle Trees• Repair builds a Merkle tree• Compared with replicas• Efficient• If differences are found,  portions o...
Snapshot• For backup, want consistent set of SSTables• nodetool snapshot does this• Creates hard links to existing SSTable...
Summary• How writes end up on disk• How point queries and range queries find  the data• Implications• Repair• Snapshot
Upcoming SlideShare
Loading in …5
×

Cassandra internals

4,133 views
4,025 views

Published on

Published in: Technology, Business
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
4,133
On SlideShare
0
From Embeds
0
Number of Embeds
1,486
Actions
Shares
0
Downloads
0
Comments
0
Likes
18
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Cassandra internals

    1. 1. Cassandra under the hood Richard Low rlow@acunu.com
    2. 2. Outline• What happens when you write? • Commit logs • Memtables “richard”:{ “email”:”rlow@acunu.com” • SSTables } ?• What happens when you read? • Point queries • Range queries• Repair and snapshots
    3. 3. Why should we care?• Help understand performance• Understand performance implications of data model• Helps to fix it if something goes wrong• Interesting!
    4. 4. Writes
    5. 5. Writes (2) InsertCommit log Memtable SSTable { Bloom filter, Index, Data }
    6. 6. Commit log InsertCommit log Memtable SSTable { Bloom filter, Index, Data }
    7. 7. Commit log• Each insert written to commit log first• Stored in insertion order• Inserts not acknowledged until written to commit log• Batch vs periodic• In case of crash, can replay
    8. 8. Memtable Memtable
    9. 9. Memtable• In memory store of insertions• ConcurrentSkipListMap• When too large, flushed to disk• Ensures all writes to disk are sequential
    10. 10. SSTable InsertCommit log Memtable SSTable { Bloom filter, Index, Data }
    11. 11. SSTables• Stores actual data, sorted by key• Contains a Bloom filter and index to help find keys• Read only
    12. 12. Bloom filters• Probabilistic data structure• Answers membership queries: • ‘Does the set contain x?’• Can give false positives, never false negatives• Space efficient• Typical size: 1 byte per key
    13. 13. How it works together Bloom filter Index Data011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3..........................
    14. 14. How it works together Bloom filter Index Data011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3.......................... Contains x? Where is x? Retrieve x
    15. 15. How it works together Memory Disk Bloom filter Index Data011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3.......................... Contains x? Where is x? Retrieve x
    16. 16. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242 k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
    17. 17. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 92421. Query filter k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
    18. 18. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 92421. Query filter2. Find location k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
    19. 19. Point queriesMemtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> .........SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 92421. Query filter2. Find location3. Read data k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
    20. 20. Range queries• Bloom filters useless• Use index to locate portion of SSTable• Read data, merge results• Necessary to lookup in every SSTable data file• Disk I/O proportional to #SSTables
    21. 21. Compaction• Merges SSTables• Removes overwrites and obsolete tombstones• Improves range query performance• Major compaction creates one SSTable
    22. 22. Write optimised• All writes are sequential on disk• Each write is written multiple times during compactions• Bloom filters mean approx. one I/O per read• Avoid a read-modify-write data model
    23. 23. Scaling• In memory: • Buffers • Memtables • Bloom filters • Index• If not enough memory, significant performance impact
    24. 24. Repair: Merkle Trees• Repair builds a Merkle tree• Compared with replicas• Efficient• If differences are found, portions of SSTables are streamed• Requires full disk scan to build
    25. 25. Snapshot• For backup, want consistent set of SSTables• nodetool snapshot does this• Creates hard links to existing SSTables• Implies data will be copied after a few compactions
    26. 26. Summary• How writes end up on disk• How point queries and range queries find the data• Implications• Repair• Snapshot

    ×